[llvm-branch-commits] [llvm] [AMDGPU] always emit a soft wait even if it is trivially ~0 (PR #147257)
Sameer Sahasrabuddhe via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Mon Jul 7 02:11:42 PDT 2025
https://github.com/ssahasra created https://github.com/llvm/llvm-project/pull/147257
The memory legalizer is currently responsible for emitting wait instructions at ordering operations such as acquire and release. It tries to be efficient by emitting waits only when required. In particular, it does not emit a wait on vmcnt at workgroup scope since that ordering is already guaranteed by the architecture. But this is now incorrect because direct loads to LDS have an LDS component which needs explicit ordering on vmcnt. But it is inefficient to always emit a wait on vmcnt since majority of the programs do not use direct loads to LDS, and this will affect all workgroup scope operations.
As a first step to that, the memory legalizer now emits a soft wait instruction even if all counts are trivially ~0. This is a placeholder that the SIInsertWaitcnts pass will either optimize away or strenghthen based on its analysis of whether direct loads to LDS are pending at this point in the program.
>From 6e30f375f14ce77ebd75d7e65f63601925ff1481 Mon Sep 17 00:00:00 2001
From: Sameer Sahasrabuddhe <sameer.sahasrabuddhe at amd.com>
Date: Fri, 4 Jul 2025 12:20:58 +0530
Subject: [PATCH] [AMDGPU] always emit a soft wait even if it is trivially ~0
The memory legalizer is currently responsible for emitting wait instructions at
ordering operations such as acquire and release. It tries to be efficient by
emitting waits only when required. In particular, it does not emit a wait on
vmcnt at workgroup scope since that ordering is already guaranteed by the
architecture. But this is now incorrect because direct loads to LDS have an LDS
component which needs explicit ordering on vmcnt. But it is inefficient to
always emit a wait on vmcnt since majority of the programs do not use direct
loads to LDS, and this will affect all workgroup scope operations.
As a first step to that, the memory legalizer now emits a soft wait instruction
even if all counts are trivially ~0. This is a placeholder that the
SIInsertWaitcnts pass will either optimize away or strenghthen based on its
analysis of whether direct loads to LDS are pending at this point in the
program.
---
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | 58 +-
.../CodeGen/AMDGPU/GlobalISel/flat-scratch.ll | 12 +-
.../memory-legalizer-atomic-fence.ll | 112 ++
...tor-flatscratchinit-undefined-behavior2.ll | 4 +-
.../CodeGen/AMDGPU/call-argument-types.ll | 4 +-
.../test/CodeGen/AMDGPU/dynamic_stackalloc.ll | 12 +-
llvm/test/CodeGen/AMDGPU/flat-scratch.ll | 12 +-
llvm/test/CodeGen/AMDGPU/function-args.ll | 132 +-
.../CodeGen/AMDGPU/indirect-addressing-si.ll | 2 +
.../kernel-vgpr-spill-mubuf-with-voffset.ll | 1 +
.../CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll | 1 +
.../memory-legalizer-fence-mmra-global.ll | 64 +
.../memory-legalizer-fence-mmra-local.ll | 174 +-
.../CodeGen/AMDGPU/memory-legalizer-fence.ll | 224 ++-
.../AMDGPU/memory-legalizer-flat-agent.ll | 192 +-
.../memory-legalizer-flat-singlethread.ll | 1420 +++++++++++++++
.../AMDGPU/memory-legalizer-flat-system.ll | 192 +-
.../AMDGPU/memory-legalizer-flat-volatile.ll | 16 +-
.../AMDGPU/memory-legalizer-flat-wavefront.ll | 1410 +++++++++++++++
.../AMDGPU/memory-legalizer-flat-workgroup.ll | 644 ++++++-
.../AMDGPU/memory-legalizer-global-agent.ll | 192 ++
.../memory-legalizer-global-singlethread.ll | 1204 ++++++++++++-
.../AMDGPU/memory-legalizer-global-system.ll | 168 ++
.../memory-legalizer-global-volatile.ll | 15 +-
.../memory-legalizer-global-wavefront.ll | 1204 ++++++++++++-
.../memory-legalizer-global-workgroup.ll | 788 ++++++++-
.../AMDGPU/memory-legalizer-local-agent.ll | 1038 +++++++++--
.../memory-legalizer-local-nontemporal.ll | 5 +-
.../memory-legalizer-local-singlethread.ll | 1548 +++++++++++++++++
.../AMDGPU/memory-legalizer-local-system.ll | 1038 +++++++++--
.../AMDGPU/memory-legalizer-local-volatile.ll | 34 +-
.../memory-legalizer-local-wavefront.ll | 1548 +++++++++++++++++
.../memory-legalizer-local-workgroup.ll | 1038 +++++++++--
.../CodeGen/AMDGPU/memory-legalizer-local.mir | 31 +
.../memory-legalizer-private-volatile.ll | 12 +
.../AMDGPU/memory-legalizer-region.mir | 31 +
...uf-legalize-operands-non-ptr-intrinsics.ll | 8 +-
.../CodeGen/AMDGPU/mubuf-legalize-operands.ll | 8 +-
.../CodeGen/AMDGPU/stacksave_stackrestore.ll | 6 +
llvm/test/CodeGen/AMDGPU/trap-abis.ll | 5 +
.../wait-before-stores-with-scope_sys.mir | 1 +
41 files changed, 13845 insertions(+), 763 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 3212060f303a5..f015d3ad7811e 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -1074,8 +1074,6 @@ bool SIGfx6CacheControl::insertWait(MachineBasicBlock::iterator &MI,
SIAtomicAddrSpace AddrSpace, SIMemOp Op,
bool IsCrossAddrSpaceOrdering, Position Pos,
AtomicOrdering Order) const {
- bool Changed = false;
-
MachineBasicBlock &MBB = *MI->getParent();
DebugLoc DL = MI->getDebugLoc();
@@ -1149,21 +1147,19 @@ bool SIGfx6CacheControl::insertWait(MachineBasicBlock::iterator &MI,
}
}
- if (VMCnt || LGKMCnt) {
- unsigned WaitCntImmediate =
- AMDGPU::encodeWaitcnt(IV,
- VMCnt ? 0 : getVmcntBitMask(IV),
- getExpcntBitMask(IV),
- LGKMCnt ? 0 : getLgkmcntBitMask(IV));
- BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_WAITCNT_soft))
- .addImm(WaitCntImmediate);
- Changed = true;
- }
+ // Always emit a soft wait count, even if it is trivially ~0. SIInsertWaitcnts
+ // will later use this marker to add additional waits such as those required
+ // from direct load to LDS (formerly known as LDS DMA).
+ unsigned WaitCntImmediate = AMDGPU::encodeWaitcnt(
+ IV, VMCnt ? 0 : getVmcntBitMask(IV), getExpcntBitMask(IV),
+ LGKMCnt ? 0 : getLgkmcntBitMask(IV));
+ BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_WAITCNT_soft))
+ .addImm(WaitCntImmediate);
if (Pos == Position::AFTER)
--MI;
- return Changed;
+ return true;
}
bool SIGfx6CacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
@@ -1966,8 +1962,6 @@ bool SIGfx10CacheControl::insertWait(MachineBasicBlock::iterator &MI,
SIAtomicAddrSpace AddrSpace, SIMemOp Op,
bool IsCrossAddrSpaceOrdering,
Position Pos, AtomicOrdering Order) const {
- bool Changed = false;
-
MachineBasicBlock &MBB = *MI->getParent();
DebugLoc DL = MI->getDebugLoc();
@@ -2057,28 +2051,25 @@ bool SIGfx10CacheControl::insertWait(MachineBasicBlock::iterator &MI,
}
}
- if (VMCnt || LGKMCnt) {
- unsigned WaitCntImmediate =
- AMDGPU::encodeWaitcnt(IV,
- VMCnt ? 0 : getVmcntBitMask(IV),
- getExpcntBitMask(IV),
- LGKMCnt ? 0 : getLgkmcntBitMask(IV));
- BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_WAITCNT_soft))
- .addImm(WaitCntImmediate);
- Changed = true;
- }
+ // Always emit a soft wait count, even if it is trivially ~0. SIInsertWaitcnts
+ // will later use this marker to add additional waits such as those required
+ // from direct load to LDS (formerly known as LDS DMA).
+ unsigned WaitCntImmediate = AMDGPU::encodeWaitcnt(
+ IV, VMCnt ? 0 : getVmcntBitMask(IV), getExpcntBitMask(IV),
+ LGKMCnt ? 0 : getLgkmcntBitMask(IV));
+ BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_WAITCNT_soft))
+ .addImm(WaitCntImmediate);
if (VSCnt) {
BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_WAITCNT_VSCNT_soft))
.addReg(AMDGPU::SGPR_NULL, RegState::Undef)
.addImm(0);
- Changed = true;
}
if (Pos == Position::AFTER)
--MI;
- return Changed;
+ return true;
}
bool SIGfx10CacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
@@ -2287,8 +2278,6 @@ bool SIGfx12CacheControl::insertWait(MachineBasicBlock::iterator &MI,
SIAtomicAddrSpace AddrSpace, SIMemOp Op,
bool IsCrossAddrSpaceOrdering,
Position Pos, AtomicOrdering Order) const {
- bool Changed = false;
-
MachineBasicBlock &MBB = *MI->getParent();
DebugLoc DL = MI->getDebugLoc();
@@ -2372,23 +2361,26 @@ bool SIGfx12CacheControl::insertWait(MachineBasicBlock::iterator &MI,
BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_WAIT_SAMPLECNT_soft)).addImm(0);
}
BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_WAIT_LOADCNT_soft)).addImm(0);
- Changed = true;
+ } else {
+ // Always emit a soft wait count, even if it is trivially ~0.
+ // SIInsertWaitcnts will later use this marker to add additional waits such
+ // as those required from direct load to LDS (formerly known as LDS DMA).
+ BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_WAIT_LOADCNT_soft))
+ .addImm(getLoadcntBitMask(IV));
}
if (STORECnt) {
BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_WAIT_STORECNT_soft)).addImm(0);
- Changed = true;
}
if (DSCnt) {
BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_WAIT_DSCNT_soft)).addImm(0);
- Changed = true;
}
if (Pos == Position::AFTER)
--MI;
- return Changed;
+ return true;
}
bool SIGfx12CacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll
index 8a80afd4a768f..1bbbec977b714 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll
@@ -880,8 +880,8 @@ define amdgpu_kernel void @store_load_vindex_small_offset_kernel(i32 %n) {
; GFX10-NEXT: v_lshlrev_b32_e32 v1, 2, v1
; GFX10-NEXT: v_add_nc_u32_e32 v0, 0x100, v0
; GFX10-NEXT: scratch_store_dword v0, v2, off offset:128
-; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
+; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_lshl_b32 s0, s0, 7
; GFX10-NEXT: s_add_u32 s0, 0x100, s0
; GFX10-NEXT: v_add_nc_u32_e32 v1, s0, v1
@@ -921,8 +921,8 @@ define amdgpu_kernel void @store_load_vindex_small_offset_kernel(i32 %n) {
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_4) | instid1(SALU_CYCLE_1)
; GFX11-NEXT: v_lshlrev_b32_e32 v1, 2, v1
; GFX11-NEXT: scratch_store_b32 v0, v2, off offset:384 dlc
-; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_lshl_b32 s0, s0, 7
; GFX11-NEXT: s_add_u32 s0, 0x100, s0
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
@@ -991,8 +991,8 @@ define amdgpu_kernel void @store_load_vindex_small_offset_kernel(i32 %n) {
; UNALIGNED_GFX10-NEXT: v_lshlrev_b32_e32 v1, 2, v1
; UNALIGNED_GFX10-NEXT: v_add_nc_u32_e32 v0, 0x100, v0
; UNALIGNED_GFX10-NEXT: scratch_store_dword v0, v2, off offset:128
-; UNALIGNED_GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; UNALIGNED_GFX10-NEXT: s_waitcnt lgkmcnt(0)
+; UNALIGNED_GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; UNALIGNED_GFX10-NEXT: s_lshl_b32 s0, s0, 7
; UNALIGNED_GFX10-NEXT: s_add_u32 s0, 0x100, s0
; UNALIGNED_GFX10-NEXT: v_add_nc_u32_e32 v1, s0, v1
@@ -1032,8 +1032,8 @@ define amdgpu_kernel void @store_load_vindex_small_offset_kernel(i32 %n) {
; UNALIGNED_GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_4) | instid1(SALU_CYCLE_1)
; UNALIGNED_GFX11-NEXT: v_lshlrev_b32_e32 v1, 2, v1
; UNALIGNED_GFX11-NEXT: scratch_store_b32 v0, v2, off offset:384 dlc
-; UNALIGNED_GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; UNALIGNED_GFX11-NEXT: s_waitcnt lgkmcnt(0)
+; UNALIGNED_GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; UNALIGNED_GFX11-NEXT: s_lshl_b32 s0, s0, 7
; UNALIGNED_GFX11-NEXT: s_add_u32 s0, 0x100, s0
; UNALIGNED_GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
@@ -1520,8 +1520,8 @@ define amdgpu_kernel void @store_load_vindex_large_offset_kernel(i32 %n) {
; GFX10-NEXT: v_lshlrev_b32_e32 v1, 2, v1
; GFX10-NEXT: v_add_nc_u32_e32 v0, 0x4004, v0
; GFX10-NEXT: scratch_store_dword v0, v2, off offset:128
-; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
+; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_lshl_b32 s0, s0, 7
; GFX10-NEXT: s_add_u32 s0, 0x4004, s0
; GFX10-NEXT: v_add_nc_u32_e32 v1, s0, v1
@@ -1633,8 +1633,8 @@ define amdgpu_kernel void @store_load_vindex_large_offset_kernel(i32 %n) {
; UNALIGNED_GFX10-NEXT: v_lshlrev_b32_e32 v1, 2, v1
; UNALIGNED_GFX10-NEXT: v_add_nc_u32_e32 v0, 0x4004, v0
; UNALIGNED_GFX10-NEXT: scratch_store_dword v0, v2, off offset:128
-; UNALIGNED_GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; UNALIGNED_GFX10-NEXT: s_waitcnt lgkmcnt(0)
+; UNALIGNED_GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; UNALIGNED_GFX10-NEXT: s_lshl_b32 s0, s0, 7
; UNALIGNED_GFX10-NEXT: s_add_u32 s0, 0x4004, s0
; UNALIGNED_GFX10-NEXT: v_add_nc_u32_e32 v1, s0, v1
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/memory-legalizer-atomic-fence.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/memory-legalizer-atomic-fence.ll
index 66037615f0ba0..ea6a5d5e74d52 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/memory-legalizer-atomic-fence.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/memory-legalizer-atomic-fence.ll
@@ -199,26 +199,32 @@ entry:
define amdgpu_kernel void @singlethread_one_as_acquire() #0 {
; GFX6-LABEL: name: singlethread_one_as_acquire
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: singlethread_one_as_acquire
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: singlethread_one_as_acquire
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: singlethread_one_as_acquire
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: singlethread_one_as_acquire
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: singlethread_one_as_acquire
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("singlethread-one-as") acquire
@@ -228,26 +234,32 @@ entry:
define amdgpu_kernel void @singlethread_one_as_release() #0 {
; GFX6-LABEL: name: singlethread_one_as_release
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: singlethread_one_as_release
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: singlethread_one_as_release
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: singlethread_one_as_release
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: singlethread_one_as_release
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: singlethread_one_as_release
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("singlethread-one-as") release
@@ -257,26 +269,32 @@ entry:
define amdgpu_kernel void @singlethread_one_as_acq_rel() #0 {
; GFX6-LABEL: name: singlethread_one_as_acq_rel
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: singlethread_one_as_acq_rel
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: singlethread_one_as_acq_rel
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: singlethread_one_as_acq_rel
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: singlethread_one_as_acq_rel
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: singlethread_one_as_acq_rel
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("singlethread-one-as") acq_rel
@@ -286,26 +304,32 @@ entry:
define amdgpu_kernel void @singlethread_one_as_seq_cst() #0 {
; GFX6-LABEL: name: singlethread_one_as_seq_cst
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: singlethread_one_as_seq_cst
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: singlethread_one_as_seq_cst
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: singlethread_one_as_seq_cst
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: singlethread_one_as_seq_cst
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: singlethread_one_as_seq_cst
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("singlethread-one-as") seq_cst
@@ -501,10 +525,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_acquire() #0 {
; GFX6-LABEL: name: workgroup_one_as_acquire
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: workgroup_one_as_acquire
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: workgroup_one_as_acquire
@@ -516,6 +542,7 @@ define amdgpu_kernel void @workgroup_one_as_acquire() #0 {
;
; GFX10CU-LABEL: name: workgroup_one_as_acquire
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: workgroup_one_as_acquire
@@ -527,6 +554,7 @@ define amdgpu_kernel void @workgroup_one_as_acquire() #0 {
;
; GFX11CU-LABEL: name: workgroup_one_as_acquire
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("workgroup-one-as") acquire
@@ -536,10 +564,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_release() #0 {
; GFX6-LABEL: name: workgroup_one_as_release
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: workgroup_one_as_release
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: workgroup_one_as_release
@@ -550,6 +580,7 @@ define amdgpu_kernel void @workgroup_one_as_release() #0 {
;
; GFX10CU-LABEL: name: workgroup_one_as_release
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: workgroup_one_as_release
@@ -560,6 +591,7 @@ define amdgpu_kernel void @workgroup_one_as_release() #0 {
;
; GFX11CU-LABEL: name: workgroup_one_as_release
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("workgroup-one-as") release
@@ -569,10 +601,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_acq_rel() #0 {
; GFX6-LABEL: name: workgroup_one_as_acq_rel
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: workgroup_one_as_acq_rel
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: workgroup_one_as_acq_rel
@@ -584,6 +618,7 @@ define amdgpu_kernel void @workgroup_one_as_acq_rel() #0 {
;
; GFX10CU-LABEL: name: workgroup_one_as_acq_rel
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: workgroup_one_as_acq_rel
@@ -595,6 +630,7 @@ define amdgpu_kernel void @workgroup_one_as_acq_rel() #0 {
;
; GFX11CU-LABEL: name: workgroup_one_as_acq_rel
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("workgroup-one-as") acq_rel
@@ -604,10 +640,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_seq_cst() #0 {
; GFX6-LABEL: name: workgroup_one_as_seq_cst
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: workgroup_one_as_seq_cst
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: workgroup_one_as_seq_cst
@@ -619,6 +657,7 @@ define amdgpu_kernel void @workgroup_one_as_seq_cst() #0 {
;
; GFX10CU-LABEL: name: workgroup_one_as_seq_cst
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: workgroup_one_as_seq_cst
@@ -630,6 +669,7 @@ define amdgpu_kernel void @workgroup_one_as_seq_cst() #0 {
;
; GFX11CU-LABEL: name: workgroup_one_as_seq_cst
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("workgroup-one-as") seq_cst
@@ -639,26 +679,32 @@ entry:
define amdgpu_kernel void @wavefront_one_as_acquire() #0 {
; GFX6-LABEL: name: wavefront_one_as_acquire
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: wavefront_one_as_acquire
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: wavefront_one_as_acquire
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: wavefront_one_as_acquire
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: wavefront_one_as_acquire
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: wavefront_one_as_acquire
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("wavefront-one-as") acquire
@@ -668,26 +714,32 @@ entry:
define amdgpu_kernel void @wavefront_one_as_release() #0 {
; GFX6-LABEL: name: wavefront_one_as_release
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: wavefront_one_as_release
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: wavefront_one_as_release
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: wavefront_one_as_release
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: wavefront_one_as_release
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: wavefront_one_as_release
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("wavefront-one-as") release
@@ -697,26 +749,32 @@ entry:
define amdgpu_kernel void @wavefront_one_as_acq_rel() #0 {
; GFX6-LABEL: name: wavefront_one_as_acq_rel
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: wavefront_one_as_acq_rel
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: wavefront_one_as_acq_rel
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: wavefront_one_as_acq_rel
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: wavefront_one_as_acq_rel
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: wavefront_one_as_acq_rel
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("wavefront-one-as") acq_rel
@@ -726,26 +784,32 @@ entry:
define amdgpu_kernel void @wavefront_one_as_seq_cst() #0 {
; GFX6-LABEL: name: wavefront_one_as_seq_cst
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: wavefront_one_as_seq_cst
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: wavefront_one_as_seq_cst
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: wavefront_one_as_seq_cst
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: wavefront_one_as_seq_cst
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: wavefront_one_as_seq_cst
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("wavefront-one-as") seq_cst
@@ -941,26 +1005,32 @@ entry:
define amdgpu_kernel void @singlethread_acquire() #0 {
; GFX6-LABEL: name: singlethread_acquire
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: singlethread_acquire
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: singlethread_acquire
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: singlethread_acquire
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: singlethread_acquire
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: singlethread_acquire
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("singlethread") acquire
@@ -970,26 +1040,32 @@ entry:
define amdgpu_kernel void @singlethread_release() #0 {
; GFX6-LABEL: name: singlethread_release
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: singlethread_release
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: singlethread_release
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: singlethread_release
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: singlethread_release
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: singlethread_release
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("singlethread") release
@@ -999,26 +1075,32 @@ entry:
define amdgpu_kernel void @singlethread_acq_rel() #0 {
; GFX6-LABEL: name: singlethread_acq_rel
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: singlethread_acq_rel
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: singlethread_acq_rel
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: singlethread_acq_rel
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: singlethread_acq_rel
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: singlethread_acq_rel
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("singlethread") acq_rel
@@ -1028,26 +1110,32 @@ entry:
define amdgpu_kernel void @singlethread_seq_cst() #0 {
; GFX6-LABEL: name: singlethread_seq_cst
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: singlethread_seq_cst
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: singlethread_seq_cst
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: singlethread_seq_cst
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: singlethread_seq_cst
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: singlethread_seq_cst
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("singlethread") seq_cst
@@ -1397,26 +1485,32 @@ entry:
define amdgpu_kernel void @wavefront_acquire() #0 {
; GFX6-LABEL: name: wavefront_acquire
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: wavefront_acquire
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: wavefront_acquire
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: wavefront_acquire
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: wavefront_acquire
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: wavefront_acquire
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("wavefront") acquire
@@ -1426,26 +1520,32 @@ entry:
define amdgpu_kernel void @wavefront_release() #0 {
; GFX6-LABEL: name: wavefront_release
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: wavefront_release
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: wavefront_release
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: wavefront_release
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: wavefront_release
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: wavefront_release
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("wavefront") release
@@ -1455,26 +1555,32 @@ entry:
define amdgpu_kernel void @wavefront_acq_rel() #0 {
; GFX6-LABEL: name: wavefront_acq_rel
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: wavefront_acq_rel
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: wavefront_acq_rel
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: wavefront_acq_rel
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: wavefront_acq_rel
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: wavefront_acq_rel
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("wavefront") acq_rel
@@ -1484,26 +1590,32 @@ entry:
define amdgpu_kernel void @wavefront_seq_cst() #0 {
; GFX6-LABEL: name: wavefront_seq_cst
; GFX6: bb.0.entry:
+ ; GFX6-NEXT: S_WAITCNT_soft 3967
; GFX6-NEXT: S_ENDPGM 0
;
; GFX8-LABEL: name: wavefront_seq_cst
; GFX8: bb.0.entry:
+ ; GFX8-NEXT: S_WAITCNT_soft 3967
; GFX8-NEXT: S_ENDPGM 0
;
; GFX10WGP-LABEL: name: wavefront_seq_cst
; GFX10WGP: bb.0.entry:
+ ; GFX10WGP-NEXT: S_WAITCNT_soft 65407
; GFX10WGP-NEXT: S_ENDPGM 0
;
; GFX10CU-LABEL: name: wavefront_seq_cst
; GFX10CU: bb.0.entry:
+ ; GFX10CU-NEXT: S_WAITCNT_soft 65407
; GFX10CU-NEXT: S_ENDPGM 0
;
; GFX11WGP-LABEL: name: wavefront_seq_cst
; GFX11WGP: bb.0.entry:
+ ; GFX11WGP-NEXT: S_WAITCNT_soft 65527
; GFX11WGP-NEXT: S_ENDPGM 0
;
; GFX11CU-LABEL: name: wavefront_seq_cst
; GFX11CU: bb.0.entry:
+ ; GFX11CU-NEXT: S_WAITCNT_soft 65527
; GFX11CU-NEXT: S_ENDPGM 0
entry:
fence syncscope("wavefront") seq_cst
diff --git a/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-undefined-behavior2.ll b/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-undefined-behavior2.ll
index 51caa84450ff3..a713450809ad0 100644
--- a/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-undefined-behavior2.ll
+++ b/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-undefined-behavior2.ll
@@ -93,8 +93,8 @@ define void @with_private_to_flat_addrspacecast(ptr addrspace(5) %ptr) #0 {
; GFX10-NEXT: v_cndmask_b32_e64 v1, 0, s5, vcc_lo
; GFX10-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc_lo
; GFX10-NEXT: flat_store_dword v[0:1], v2
-; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
+; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_setpc_b64 s[30:31]
%stof = addrspacecast ptr addrspace(5) %ptr to ptr
store volatile i32 0, ptr %stof
@@ -723,8 +723,8 @@ define void @calls_intrin_ascast(ptr addrspace(3) %ptr) #0 {
; GFX10-NEXT: v_mov_b32_e32 v2, 7
; GFX10-NEXT: v_mov_b32_e32 v1, s5
; GFX10-NEXT: flat_store_dword v[0:1], v2
-; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
+; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_setpc_b64 s[30:31]
%1 = call ptr @llvm.amdgcn.addrspacecast.nonnull.p0.p3(ptr addrspace(3) %ptr)
store volatile i32 7, ptr %1, align 4
diff --git a/llvm/test/CodeGen/AMDGPU/call-argument-types.ll b/llvm/test/CodeGen/AMDGPU/call-argument-types.ll
index acf2f8add7670..9feb029b9bed3 100644
--- a/llvm/test/CodeGen/AMDGPU/call-argument-types.ll
+++ b/llvm/test/CodeGen/AMDGPU/call-argument-types.ll
@@ -5064,8 +5064,8 @@ define amdgpu_kernel void @test_call_external_void_func_sret_struct_i8_i32_byval
; GFX11-TRUE16-NEXT: s_mov_b32 s2, -1
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(1)
; GFX11-TRUE16-NEXT: buffer_store_b8 v0, off, s[0:3], 0 dlc
-; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(0)
+; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: buffer_store_b32 v1, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: s_nop 0
@@ -5095,8 +5095,8 @@ define amdgpu_kernel void @test_call_external_void_func_sret_struct_i8_i32_byval
; GFX11-FAKE16-NEXT: s_mov_b32 s2, -1
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(1)
; GFX11-FAKE16-NEXT: buffer_store_b8 v0, off, s[0:3], 0 dlc
-; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(0)
+; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: buffer_store_b32 v1, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: s_nop 0
diff --git a/llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll b/llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll
index b0439b1f7968f..9aa19555bcbe0 100644
--- a/llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll
+++ b/llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll
@@ -51,8 +51,8 @@ define amdgpu_kernel void @test_dynamic_stackalloc_kernel_uniform(i32 %n) {
; GFX11-SDAG-NEXT: s_mov_b32 s33, 0
; GFX11-SDAG-NEXT: s_mov_b32 s1, s32
; GFX11-SDAG-NEXT: scratch_store_b32 off, v0, s1 dlc
-; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 2
; GFX11-SDAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
; GFX11-SDAG-NEXT: s_add_i32 s0, s0, 15
@@ -70,8 +70,8 @@ define amdgpu_kernel void @test_dynamic_stackalloc_kernel_uniform(i32 %n) {
; GFX11-GISEL-NEXT: s_mov_b32 s33, 0
; GFX11-GISEL-NEXT: s_mov_b32 s0, s32
; GFX11-GISEL-NEXT: scratch_store_b32 off, v0, s0 dlc
-; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-GISEL-NEXT: s_lshl2_add_u32 s1, s1, 15
; GFX11-GISEL-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
; GFX11-GISEL-NEXT: s_and_b32 s1, s1, -16
@@ -135,8 +135,8 @@ define amdgpu_kernel void @test_dynamic_stackalloc_kernel_uniform_over_aligned(i
; GFX11-SDAG-NEXT: s_mov_b32 s33, 0
; GFX11-SDAG-NEXT: s_and_b32 s1, s1, 0xfffff000
; GFX11-SDAG-NEXT: scratch_store_b32 off, v0, s1 dlc
-; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 2
; GFX11-SDAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
; GFX11-SDAG-NEXT: s_add_i32 s0, s0, 15
@@ -155,8 +155,8 @@ define amdgpu_kernel void @test_dynamic_stackalloc_kernel_uniform_over_aligned(i
; GFX11-GISEL-NEXT: s_mov_b32 s33, 0
; GFX11-GISEL-NEXT: s_and_b32 s1, s1, 0xfffff000
; GFX11-GISEL-NEXT: scratch_store_b32 off, v0, s1 dlc
-; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-GISEL-NEXT: s_lshl2_add_u32 s0, s0, 15
; GFX11-GISEL-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
; GFX11-GISEL-NEXT: s_and_b32 s0, s0, -16
@@ -216,8 +216,8 @@ define amdgpu_kernel void @test_dynamic_stackalloc_kernel_uniform_under_aligned(
; GFX11-SDAG-NEXT: s_mov_b32 s33, 0
; GFX11-SDAG-NEXT: s_mov_b32 s1, s32
; GFX11-SDAG-NEXT: scratch_store_b32 off, v0, s1 dlc
-; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 2
; GFX11-SDAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
; GFX11-SDAG-NEXT: s_add_i32 s0, s0, 15
@@ -235,8 +235,8 @@ define amdgpu_kernel void @test_dynamic_stackalloc_kernel_uniform_under_aligned(
; GFX11-GISEL-NEXT: s_mov_b32 s33, 0
; GFX11-GISEL-NEXT: s_mov_b32 s0, s32
; GFX11-GISEL-NEXT: scratch_store_b32 off, v0, s0 dlc
-; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-GISEL-NEXT: s_lshl2_add_u32 s1, s1, 15
; GFX11-GISEL-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
; GFX11-GISEL-NEXT: s_and_b32 s1, s1, -16
diff --git a/llvm/test/CodeGen/AMDGPU/flat-scratch.ll b/llvm/test/CodeGen/AMDGPU/flat-scratch.ll
index b5e579b78a59c..8999e91208b3a 100644
--- a/llvm/test/CodeGen/AMDGPU/flat-scratch.ll
+++ b/llvm/test/CodeGen/AMDGPU/flat-scratch.ll
@@ -1946,8 +1946,8 @@ define amdgpu_kernel void @store_load_vindex_small_offset_kernel(i32 %n) {
; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_add_nc_u32_e32 v1, 0x100, v0
; GFX10-NEXT: scratch_store_dword v1, v2, off offset:128
-; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
+; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_lshl_b32 s0, s0, 7
; GFX10-NEXT: s_addk_i32 s0, 0x100
; GFX10-NEXT: v_sub_nc_u32_e32 v0, s0, v0
@@ -1963,8 +1963,8 @@ define amdgpu_kernel void @store_load_vindex_small_offset_kernel(i32 %n) {
; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0
; GFX11-NEXT: scratch_store_b32 v0, v1, off offset:384 dlc
-; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_lshl_b32 s0, s0, 7
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
; GFX11-NEXT: s_addk_i32 s0, 0x100
@@ -2080,8 +2080,8 @@ define amdgpu_kernel void @store_load_vindex_small_offset_kernel(i32 %n) {
; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
; GFX1030-PAL-NEXT: v_add_nc_u32_e32 v1, 0x100, v0
; GFX1030-PAL-NEXT: scratch_store_dword v1, v2, off offset:128
-; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
+; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 7
; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x100
; GFX1030-PAL-NEXT: v_sub_nc_u32_e32 v0, s0, v0
@@ -2097,8 +2097,8 @@ define amdgpu_kernel void @store_load_vindex_small_offset_kernel(i32 %n) {
; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, off offset:384 dlc
-; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-PAL-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 7
; GFX11-PAL-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
; GFX11-PAL-NEXT: s_addk_i32 s0, 0x100
@@ -3242,8 +3242,8 @@ define amdgpu_kernel void @store_load_vindex_large_offset_kernel(i32 %n) {
; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_add_nc_u32_e32 v1, 0x4004, v0
; GFX10-NEXT: scratch_store_dword v1, v2, off offset:128
-; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
+; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_lshl_b32 s0, s0, 7
; GFX10-NEXT: s_addk_i32 s0, 0x4004
; GFX10-NEXT: v_sub_nc_u32_e32 v0, s0, v0
@@ -3379,8 +3379,8 @@ define amdgpu_kernel void @store_load_vindex_large_offset_kernel(i32 %n) {
; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
; GFX1030-PAL-NEXT: v_add_nc_u32_e32 v1, 0x4004, v0
; GFX1030-PAL-NEXT: scratch_store_dword v1, v2, off offset:128
-; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
+; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 7
; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x4004
; GFX1030-PAL-NEXT: v_sub_nc_u32_e32 v0, s0, v0
diff --git a/llvm/test/CodeGen/AMDGPU/function-args.ll b/llvm/test/CodeGen/AMDGPU/function-args.ll
index a901d7f97eb37..d4fd7fe1adc7d 100644
--- a/llvm/test/CodeGen/AMDGPU/function-args.ll
+++ b/llvm/test/CodeGen/AMDGPU/function-args.ll
@@ -2943,11 +2943,11 @@ define void @void_func_v32i32_i32_i64(<32 x i32> %arg0, i32 %arg1, i64 %arg2) #0
; GFX11-NEXT: buffer_store_b128 v[4:7], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b128 v[0:3], off, s[0:3], 0 dlc
-; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_waitcnt vmcnt(1)
-; GFX11-NEXT: buffer_store_b32 v34, off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b32 v34, off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b64 v[32:33], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_setpc_b64 s[30:31]
@@ -3072,21 +3072,21 @@ define void @void_func_v32i32_i1_i8_i16_bf16(<32 x i32> %arg0, i1 %arg1, i8 %arg
; GFX11-TRUE16-NEXT: buffer_store_b128 v[4:7], off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: buffer_store_b128 v[0:3], off, s[0:3], 0 dlc
-; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(4)
+; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: v_and_b32_e32 v0, 1, v36
; GFX11-TRUE16-NEXT: buffer_store_b8 v0, off, s[0:3], 0 dlc
-; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(3)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v32, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v32, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(2)
-; GFX11-TRUE16-NEXT: buffer_store_b16 v33, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b16 v33, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(1)
-; GFX11-TRUE16-NEXT: buffer_store_b16 v34, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b16 v34, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(0)
+; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: buffer_store_b16 v35, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: s_setpc_b64 s[30:31]
@@ -3111,8 +3111,8 @@ define void @void_func_v32i32_i1_i8_i16_bf16(<32 x i32> %arg0, i1 %arg1, i8 %arg
; GFX11-FAKE16-NEXT: buffer_store_b128 v[20:23], off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: buffer_store_b128 v[16:19], off, s[0:3], 0 dlc
-; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(4)
+; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: v_and_b32_e32 v16, 1, v32
; GFX11-FAKE16-NEXT: buffer_store_b128 v[12:15], off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
@@ -3123,17 +3123,17 @@ define void @void_func_v32i32_i1_i8_i16_bf16(<32 x i32> %arg0, i1 %arg1, i8 %arg
; GFX11-FAKE16-NEXT: buffer_store_b128 v[0:3], off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: buffer_store_b8 v16, off, s[0:3], 0 dlc
-; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(3)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v33, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v33, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(2)
-; GFX11-FAKE16-NEXT: buffer_store_b16 v34, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b16 v34, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(1)
-; GFX11-FAKE16-NEXT: buffer_store_b16 v35, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b16 v35, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(0)
+; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: buffer_store_b16 v36, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: s_setpc_b64 s[30:31]
@@ -3207,11 +3207,11 @@ define void @void_func_v32i32_v2i32_v2f32(<32 x i32> %arg0, <2 x i32> %arg1, <2
; GFX11-NEXT: buffer_store_b128 v[4:7], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b128 v[0:3], off, s[0:3], 0 dlc
-; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_waitcnt vmcnt(2)
-; GFX11-NEXT: buffer_store_b64 v[32:33], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b64 v[32:33], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b64 v[34:35], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_setpc_b64 s[30:31]
@@ -3357,17 +3357,17 @@ define void @void_func_v32i32_v2i16_v2f16_v2bf16_v4bf16(<32 x i32> %arg0, <2 x i
; GFX11-NEXT: buffer_store_b128 v[4:7], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b128 v[0:3], off, s[0:3], 0 dlc
-; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_waitcnt vmcnt(3)
-; GFX11-NEXT: buffer_store_b32 v34, off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b32 v34, off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(2)
-; GFX11-NEXT: buffer_store_b32 v35, off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b32 v35, off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(1)
-; GFX11-NEXT: buffer_store_b32 v36, off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b32 v36, off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b64 v[32:33], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_setpc_b64 s[30:31]
@@ -3523,11 +3523,11 @@ define void @void_func_v32i32_v2i64_v2f64(<32 x i32> %arg0, <2 x i64> %arg1, <2
; GFX11-NEXT: buffer_store_b128 v[4:7], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b128 v[0:3], off, s[0:3], 0 dlc
-; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_waitcnt vmcnt(1)
-; GFX11-NEXT: buffer_store_b128 v[36:39], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[36:39], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b128 v[32:35], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_setpc_b64 s[30:31]
@@ -3606,11 +3606,11 @@ define void @void_func_v32i32_v4i32_v4f32(<32 x i32> %arg0, <4 x i32> %arg1, <4
; GFX11-NEXT: buffer_store_b128 v[4:7], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b128 v[0:3], off, s[0:3], 0 dlc
-; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_waitcnt vmcnt(4)
-; GFX11-NEXT: buffer_store_b128 v[32:35], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[32:35], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b128 v[36:39], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_setpc_b64 s[30:31]
@@ -3808,17 +3808,17 @@ define void @void_func_v32i32_v8i32_v8f32(<32 x i32> %arg0, <8 x i32> %arg1, <8
; GFX11-NEXT: buffer_store_b128 v[4:7], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b128 v[0:3], off, s[0:3], 0 dlc
-; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_waitcnt vmcnt(3)
-; GFX11-NEXT: buffer_store_b128 v[52:55], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[52:55], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(2)
-; GFX11-NEXT: buffer_store_b128 v[48:51], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[48:51], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(1)
-; GFX11-NEXT: buffer_store_b128 v[36:39], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[36:39], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b128 v[32:35], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_setpc_b64 s[30:31]
@@ -4106,29 +4106,29 @@ define void @void_func_v32i32_v16i32_v16f32(<32 x i32> %arg0, <16 x i32> %arg1,
; GFX11-NEXT: buffer_store_b128 v[4:7], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b128 v[0:3], off, s[0:3], 0 dlc
-; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_waitcnt vmcnt(7)
-; GFX11-NEXT: buffer_store_b128 v[84:87], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[84:87], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(6)
-; GFX11-NEXT: buffer_store_b128 v[80:83], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[80:83], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(5)
-; GFX11-NEXT: buffer_store_b128 v[68:71], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[68:71], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(4)
-; GFX11-NEXT: buffer_store_b128 v[64:67], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[64:67], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(3)
-; GFX11-NEXT: buffer_store_b128 v[52:55], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[52:55], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(2)
-; GFX11-NEXT: buffer_store_b128 v[48:51], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[48:51], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(1)
-; GFX11-NEXT: buffer_store_b128 v[36:39], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-NEXT: buffer_store_b128 v[36:39], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: buffer_store_b128 v[32:35], off, s[0:3], 0 dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_setpc_b64 s[30:31]
@@ -4587,53 +4587,53 @@ define void @void_func_v32i32_v16i8(<32 x i32> %arg0, <16 x i8> %arg1) #0 {
; GFX11-TRUE16-NEXT: buffer_store_b128 v[4:7], off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: buffer_store_b128 v[0:3], off, s[0:3], 0 dlc
-; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(15)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v32, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v32, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(14)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v33, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v33, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(13)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v34, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v34, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(12)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v35, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v35, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(11)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v36, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v36, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(10)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v37, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v37, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(9)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v38, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v38, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(8)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v39, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v39, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(7)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v48, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v48, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(6)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v49, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v49, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(5)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v50, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v50, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(4)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v51, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v51, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(3)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v52, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v52, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(2)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v53, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v53, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(1)
-; GFX11-TRUE16-NEXT: buffer_store_b8 v54, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-TRUE16-NEXT: buffer_store_b8 v54, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(0)
+; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: buffer_store_b8 v55, off, s[0:3], 0 dlc
; GFX11-TRUE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-TRUE16-NEXT: s_setpc_b64 s[30:31]
@@ -4677,53 +4677,53 @@ define void @void_func_v32i32_v16i8(<32 x i32> %arg0, <16 x i8> %arg1) #0 {
; GFX11-FAKE16-NEXT: buffer_store_b128 v[4:7], off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: buffer_store_b128 v[0:3], off, s[0:3], 0 dlc
-; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(15)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v32, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v32, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(14)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v33, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v33, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(13)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v34, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v34, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(12)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v35, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v35, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(11)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v36, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v36, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(10)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v37, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v37, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(9)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v38, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v38, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(8)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v39, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v39, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(7)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v48, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v48, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(6)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v49, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v49, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(5)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v50, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v50, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(4)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v51, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v51, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(3)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v52, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v52, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(2)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v53, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v53, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(1)
-; GFX11-FAKE16-NEXT: buffer_store_b8 v54, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX11-FAKE16-NEXT: buffer_store_b8 v54, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt vmcnt(0)
+; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: buffer_store_b8 v55, off, s[0:3], 0 dlc
; GFX11-FAKE16-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FAKE16-NEXT: s_setpc_b64 s[30:31]
diff --git a/llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll b/llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll
index 17a5f520ff41e..5ae466a4ca188 100644
--- a/llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll
+++ b/llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll
@@ -7802,10 +7802,12 @@ define amdgpu_kernel void @multi_same_block(i32 %arg) {
; NOOPT-NEXT: ; implicit-def: $sgpr0
; NOOPT-NEXT: v_mov_b32_e32 v0, s0
; NOOPT-NEXT: ds_write_b32 v0, v2
+; NOOPT-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; NOOPT-NEXT: s_mov_b32 m0, -1
; NOOPT-NEXT: ; implicit-def: $sgpr0
; NOOPT-NEXT: v_mov_b32_e32 v0, s0
; NOOPT-NEXT: ds_write_b32 v0, v1
+; NOOPT-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; NOOPT-NEXT: s_endpgm
;
; SI-MOVREL-LABEL: multi_same_block:
diff --git a/llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll b/llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll
index 0681263b7428e..71e85e4b948d8 100644
--- a/llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll
+++ b/llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll
@@ -71,6 +71,7 @@ define amdgpu_kernel void @test_kernel(i32 %val) #0 {
; CHECK-NEXT: v_mov_b32_e32 v0, s4
; CHECK-NEXT: s_waitcnt vmcnt(0)
; CHECK-NEXT: ds_write_b32 v0, v1
+; CHECK-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; CHECK-NEXT: s_endpgm
; CHECK-NEXT: .LBB0_2: ; %end
; CHECK-NEXT: s_endpgm
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll
index 44415657b6336..d319acb064f31 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll
@@ -167,6 +167,7 @@ define weak_odr amdgpu_kernel void @dpp_test1(ptr %arg) local_unnamed_addr {
; GFX8-NOOPT-NEXT: ds_read_b32 v0, v3
; GFX8-NOOPT-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NOOPT-NEXT: s_barrier
+; GFX8-NOOPT-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX8-NOOPT-NEXT: v_add_u32_e64 v1, s[0:1], v0, v0
; GFX8-NOOPT-NEXT: v_mov_b32_e32 v0, 0
; GFX8-NOOPT-NEXT: s_nop 1
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-global.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-global.ll
index 1379eb61e0853..fc6f91bdde09a 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-global.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-global.ll
@@ -16,10 +16,12 @@
define amdgpu_kernel void @workgroup_acquire_fence() {
; GFX6-LABEL: workgroup_acquire_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_acquire_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_acquire_fence:
@@ -31,14 +33,17 @@ define amdgpu_kernel void @workgroup_acquire_fence() {
;
; GFX10-CU-LABEL: workgroup_acquire_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_acquire_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_acquire_fence:
@@ -49,6 +54,7 @@ define amdgpu_kernel void @workgroup_acquire_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_acquire_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_acquire_fence:
@@ -66,6 +72,7 @@ define amdgpu_kernel void @workgroup_acquire_fence() {
;
; GFX11-CU-LABEL: workgroup_acquire_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_acquire_fence:
@@ -77,6 +84,7 @@ define amdgpu_kernel void @workgroup_acquire_fence() {
;
; GFX12-CU-LABEL: workgroup_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup") acquire, !mmra !{!"amdgpu-as", !"global"}
@@ -86,10 +94,12 @@ entry:
define amdgpu_kernel void @workgroup_release_fence() {
; GFX6-LABEL: workgroup_release_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_release_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_release_fence:
@@ -100,14 +110,17 @@ define amdgpu_kernel void @workgroup_release_fence() {
;
; GFX10-CU-LABEL: workgroup_release_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_release_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_release_fence:
@@ -117,6 +130,7 @@ define amdgpu_kernel void @workgroup_release_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_release_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_release_fence:
@@ -132,6 +146,7 @@ define amdgpu_kernel void @workgroup_release_fence() {
;
; GFX11-CU-LABEL: workgroup_release_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_release_fence:
@@ -144,6 +159,7 @@ define amdgpu_kernel void @workgroup_release_fence() {
;
; GFX12-CU-LABEL: workgroup_release_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup") release, !mmra !{!"amdgpu-as", !"global"}
@@ -153,10 +169,12 @@ entry:
define amdgpu_kernel void @workgroup_acq_rel_fence() {
; GFX6-LABEL: workgroup_acq_rel_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_acq_rel_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_acq_rel_fence:
@@ -168,14 +186,17 @@ define amdgpu_kernel void @workgroup_acq_rel_fence() {
;
; GFX10-CU-LABEL: workgroup_acq_rel_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_acq_rel_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_acq_rel_fence:
@@ -186,6 +207,7 @@ define amdgpu_kernel void @workgroup_acq_rel_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_acq_rel_fence:
@@ -203,6 +225,7 @@ define amdgpu_kernel void @workgroup_acq_rel_fence() {
;
; GFX11-CU-LABEL: workgroup_acq_rel_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_acq_rel_fence:
@@ -216,6 +239,7 @@ define amdgpu_kernel void @workgroup_acq_rel_fence() {
;
; GFX12-CU-LABEL: workgroup_acq_rel_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup") acq_rel, !mmra !{!"amdgpu-as", !"global"}
@@ -225,10 +249,12 @@ entry:
define amdgpu_kernel void @workgroup_seq_cst_fence() {
; GFX6-LABEL: workgroup_seq_cst_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_seq_cst_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_seq_cst_fence:
@@ -240,14 +266,17 @@ define amdgpu_kernel void @workgroup_seq_cst_fence() {
;
; GFX10-CU-LABEL: workgroup_seq_cst_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_seq_cst_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_seq_cst_fence:
@@ -258,6 +287,7 @@ define amdgpu_kernel void @workgroup_seq_cst_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_seq_cst_fence:
@@ -275,6 +305,7 @@ define amdgpu_kernel void @workgroup_seq_cst_fence() {
;
; GFX11-CU-LABEL: workgroup_seq_cst_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_seq_cst_fence:
@@ -288,6 +319,7 @@ define amdgpu_kernel void @workgroup_seq_cst_fence() {
;
; GFX12-CU-LABEL: workgroup_seq_cst_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup") seq_cst, !mmra !{!"amdgpu-as", !"global"}
@@ -297,10 +329,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
; GFX6-LABEL: workgroup_one_as_acquire_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_acquire_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_acquire_fence:
@@ -312,14 +346,17 @@ define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
;
; GFX10-CU-LABEL: workgroup_one_as_acquire_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_acquire_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
@@ -330,6 +367,7 @@ define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
@@ -347,6 +385,7 @@ define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
;
; GFX11-CU-LABEL: workgroup_one_as_acquire_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_acquire_fence:
@@ -358,6 +397,7 @@ define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
;
; GFX12-CU-LABEL: workgroup_one_as_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") acquire, !mmra !{!"amdgpu-as", !"global"}
@@ -367,10 +407,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_release_fence() {
; GFX6-LABEL: workgroup_one_as_release_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_release_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_release_fence:
@@ -381,14 +423,17 @@ define amdgpu_kernel void @workgroup_one_as_release_fence() {
;
; GFX10-CU-LABEL: workgroup_one_as_release_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_release_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_release_fence:
@@ -398,6 +443,7 @@ define amdgpu_kernel void @workgroup_one_as_release_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_release_fence:
@@ -413,6 +459,7 @@ define amdgpu_kernel void @workgroup_one_as_release_fence() {
;
; GFX11-CU-LABEL: workgroup_one_as_release_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_release_fence:
@@ -425,6 +472,7 @@ define amdgpu_kernel void @workgroup_one_as_release_fence() {
;
; GFX12-CU-LABEL: workgroup_one_as_release_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") release, !mmra !{!"amdgpu-as", !"global"}
@@ -434,10 +482,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
; GFX6-LABEL: workgroup_one_as_acq_rel_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_acq_rel_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_acq_rel_fence:
@@ -449,14 +499,17 @@ define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
;
; GFX10-CU-LABEL: workgroup_one_as_acq_rel_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_acq_rel_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
@@ -467,6 +520,7 @@ define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
@@ -484,6 +538,7 @@ define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
;
; GFX11-CU-LABEL: workgroup_one_as_acq_rel_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_acq_rel_fence:
@@ -497,6 +552,7 @@ define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
;
; GFX12-CU-LABEL: workgroup_one_as_acq_rel_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") acq_rel, !mmra !{!"amdgpu-as", !"global"}
@@ -506,10 +562,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
; GFX6-LABEL: workgroup_one_as_seq_cst_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_seq_cst_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_seq_cst_fence:
@@ -521,14 +579,17 @@ define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
;
; GFX10-CU-LABEL: workgroup_one_as_seq_cst_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_seq_cst_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
@@ -539,6 +600,7 @@ define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
@@ -556,6 +618,7 @@ define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
;
; GFX11-CU-LABEL: workgroup_one_as_seq_cst_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_seq_cst_fence:
@@ -569,6 +632,7 @@ define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
;
; GFX12-CU-LABEL: workgroup_one_as_seq_cst_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") seq_cst, !mmra !{!"amdgpu-as", !"global"}
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-local.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-local.ll
index 971015b391ca8..7931c1ea27e59 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-local.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-local.ll
@@ -46,6 +46,7 @@ define amdgpu_kernel void @workgroup_acquire_fence() {
;
; GFX90A-TGSPLIT-LABEL: workgroup_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_acquire_fence:
@@ -55,6 +56,7 @@ define amdgpu_kernel void @workgroup_acquire_fence() {
;
; GFX942-TGSPLIT-LABEL: workgroup_acquire_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: workgroup_acquire_fence:
@@ -69,12 +71,12 @@ define amdgpu_kernel void @workgroup_acquire_fence() {
;
; GFX12-WGP-LABEL: workgroup_acquire_fence:
; GFX12-WGP: ; %bb.0: ; %entry
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: workgroup_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup") acquire, !mmra !{!"amdgpu-as", !"local"}
@@ -114,6 +116,7 @@ define amdgpu_kernel void @workgroup_release_fence() {
;
; GFX90A-TGSPLIT-LABEL: workgroup_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_release_fence:
@@ -123,6 +126,7 @@ define amdgpu_kernel void @workgroup_release_fence() {
;
; GFX942-TGSPLIT-LABEL: workgroup_release_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: workgroup_release_fence:
@@ -180,6 +184,7 @@ define amdgpu_kernel void @workgroup_acq_rel_fence() {
;
; GFX90A-TGSPLIT-LABEL: workgroup_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:
@@ -189,6 +194,7 @@ define amdgpu_kernel void @workgroup_acq_rel_fence() {
;
; GFX942-TGSPLIT-LABEL: workgroup_acq_rel_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: workgroup_acq_rel_fence:
@@ -246,6 +252,7 @@ define amdgpu_kernel void @workgroup_seq_cst_fence() {
;
; GFX90A-TGSPLIT-LABEL: workgroup_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:
@@ -255,6 +262,7 @@ define amdgpu_kernel void @workgroup_seq_cst_fence() {
;
; GFX942-TGSPLIT-LABEL: workgroup_seq_cst_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: workgroup_seq_cst_fence:
@@ -282,54 +290,67 @@ entry:
define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
; GFX6-LABEL: workgroup_one_as_acquire_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_acquire_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_acquire_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: workgroup_one_as_acquire_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_acquire_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: workgroup_one_as_acquire_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: workgroup_one_as_acquire_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_acquire_fence:
; GFX12-WGP: ; %bb.0: ; %entry
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: workgroup_one_as_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") acquire, !mmra !{!"amdgpu-as", !"local"}
@@ -339,46 +360,57 @@ entry:
define amdgpu_kernel void @workgroup_one_as_release_fence() {
; GFX6-LABEL: workgroup_one_as_release_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_release_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_release_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: workgroup_one_as_release_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_release_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_release_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: workgroup_one_as_release_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: workgroup_one_as_release_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_release_fence:
@@ -396,46 +428,57 @@ entry:
define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
; GFX6-LABEL: workgroup_one_as_acq_rel_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_acq_rel_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_acq_rel_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: workgroup_one_as_acq_rel_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_acq_rel_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: workgroup_one_as_acq_rel_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: workgroup_one_as_acq_rel_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_acq_rel_fence:
@@ -453,46 +496,57 @@ entry:
define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
; GFX6-LABEL: workgroup_one_as_seq_cst_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_seq_cst_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_seq_cst_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: workgroup_one_as_seq_cst_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_seq_cst_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: workgroup_one_as_seq_cst_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: workgroup_one_as_seq_cst_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_seq_cst_fence:
@@ -540,6 +594,7 @@ define amdgpu_kernel void @agent_acquire_fence() {
;
; GFX90A-TGSPLIT-LABEL: agent_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: agent_acquire_fence:
@@ -549,6 +604,7 @@ define amdgpu_kernel void @agent_acquire_fence() {
;
; GFX942-TGSPLIT-LABEL: agent_acquire_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: agent_acquire_fence:
@@ -563,12 +619,12 @@ define amdgpu_kernel void @agent_acquire_fence() {
;
; GFX12-WGP-LABEL: agent_acquire_fence:
; GFX12-WGP: ; %bb.0: ; %entry
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: agent_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("agent") acquire, !mmra !{!"amdgpu-as", !"local"}
@@ -608,6 +664,7 @@ define amdgpu_kernel void @agent_release_fence() {
;
; GFX90A-TGSPLIT-LABEL: agent_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: agent_release_fence:
@@ -617,6 +674,7 @@ define amdgpu_kernel void @agent_release_fence() {
;
; GFX942-TGSPLIT-LABEL: agent_release_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: agent_release_fence:
@@ -674,6 +732,7 @@ define amdgpu_kernel void @agent_acq_rel_fence() {
;
; GFX90A-TGSPLIT-LABEL: agent_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: agent_acq_rel_fence:
@@ -683,6 +742,7 @@ define amdgpu_kernel void @agent_acq_rel_fence() {
;
; GFX942-TGSPLIT-LABEL: agent_acq_rel_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: agent_acq_rel_fence:
@@ -740,6 +800,7 @@ define amdgpu_kernel void @agent_seq_cst_fence() {
;
; GFX90A-TGSPLIT-LABEL: agent_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: agent_seq_cst_fence:
@@ -749,6 +810,7 @@ define amdgpu_kernel void @agent_seq_cst_fence() {
;
; GFX942-TGSPLIT-LABEL: agent_seq_cst_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: agent_seq_cst_fence:
@@ -776,54 +838,67 @@ entry:
define amdgpu_kernel void @agent_one_as_acquire_fence() {
; GFX6-LABEL: agent_one_as_acquire_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: agent_one_as_acquire_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: agent_one_as_acquire_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: agent_one_as_acquire_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: agent_one_as_acquire_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: agent_one_as_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: agent_one_as_acquire_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: agent_one_as_acquire_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: agent_one_as_acquire_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: agent_one_as_acquire_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: agent_one_as_acquire_fence:
; GFX12-WGP: ; %bb.0: ; %entry
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: agent_one_as_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("agent-one-as") acquire, !mmra !{!"amdgpu-as", !"local"}
@@ -833,46 +908,57 @@ entry:
define amdgpu_kernel void @agent_one_as_release_fence() {
; GFX6-LABEL: agent_one_as_release_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: agent_one_as_release_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: agent_one_as_release_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: agent_one_as_release_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: agent_one_as_release_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: agent_one_as_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: agent_one_as_release_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: agent_one_as_release_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: agent_one_as_release_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: agent_one_as_release_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: agent_one_as_release_fence:
@@ -890,46 +976,57 @@ entry:
define amdgpu_kernel void @agent_one_as_acq_rel_fence() {
; GFX6-LABEL: agent_one_as_acq_rel_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: agent_one_as_acq_rel_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: agent_one_as_acq_rel_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: agent_one_as_acq_rel_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: agent_one_as_acq_rel_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: agent_one_as_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: agent_one_as_acq_rel_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: agent_one_as_acq_rel_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: agent_one_as_acq_rel_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: agent_one_as_acq_rel_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: agent_one_as_acq_rel_fence:
@@ -947,46 +1044,57 @@ entry:
define amdgpu_kernel void @agent_one_as_seq_cst_fence() {
; GFX6-LABEL: agent_one_as_seq_cst_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: agent_one_as_seq_cst_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: agent_one_as_seq_cst_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: agent_one_as_seq_cst_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: agent_one_as_seq_cst_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: agent_one_as_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: agent_one_as_seq_cst_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: agent_one_as_seq_cst_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: agent_one_as_seq_cst_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: agent_one_as_seq_cst_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: agent_one_as_seq_cst_fence:
@@ -1034,6 +1142,7 @@ define amdgpu_kernel void @system_acquire_fence() {
;
; GFX90A-TGSPLIT-LABEL: system_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: system_acquire_fence:
@@ -1043,6 +1152,7 @@ define amdgpu_kernel void @system_acquire_fence() {
;
; GFX942-TGSPLIT-LABEL: system_acquire_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: system_acquire_fence:
@@ -1057,12 +1167,12 @@ define amdgpu_kernel void @system_acquire_fence() {
;
; GFX12-WGP-LABEL: system_acquire_fence:
; GFX12-WGP: ; %bb.0: ; %entry
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: system_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
entry:
fence acquire, !mmra !{!"amdgpu-as", !"local"}
@@ -1102,6 +1212,7 @@ define amdgpu_kernel void @system_release_fence() {
;
; GFX90A-TGSPLIT-LABEL: system_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: system_release_fence:
@@ -1111,6 +1222,7 @@ define amdgpu_kernel void @system_release_fence() {
;
; GFX942-TGSPLIT-LABEL: system_release_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: system_release_fence:
@@ -1168,6 +1280,7 @@ define amdgpu_kernel void @system_acq_rel_fence() {
;
; GFX90A-TGSPLIT-LABEL: system_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: system_acq_rel_fence:
@@ -1177,6 +1290,7 @@ define amdgpu_kernel void @system_acq_rel_fence() {
;
; GFX942-TGSPLIT-LABEL: system_acq_rel_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: system_acq_rel_fence:
@@ -1234,6 +1348,7 @@ define amdgpu_kernel void @system_seq_cst_fence() {
;
; GFX90A-TGSPLIT-LABEL: system_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: system_seq_cst_fence:
@@ -1243,6 +1358,7 @@ define amdgpu_kernel void @system_seq_cst_fence() {
;
; GFX942-TGSPLIT-LABEL: system_seq_cst_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: system_seq_cst_fence:
@@ -1270,54 +1386,67 @@ entry:
define amdgpu_kernel void @system_one_as_acquire_fence() {
; GFX6-LABEL: system_one_as_acquire_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: system_one_as_acquire_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: system_one_as_acquire_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: system_one_as_acquire_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: system_one_as_acquire_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: system_one_as_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: system_one_as_acquire_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: system_one_as_acquire_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: system_one_as_acquire_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: system_one_as_acquire_fence:
; GFX12-WGP: ; %bb.0: ; %entry
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: system_one_as_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("one-as") acquire, !mmra !{!"amdgpu-as", !"local"}
@@ -1327,46 +1456,57 @@ entry:
define amdgpu_kernel void @system_one_as_release_fence() {
; GFX6-LABEL: system_one_as_release_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: system_one_as_release_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: system_one_as_release_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: system_one_as_release_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: system_one_as_release_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: system_one_as_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: system_one_as_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: system_one_as_release_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: system_one_as_release_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: system_one_as_release_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: system_one_as_release_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: system_one_as_release_fence:
@@ -1384,46 +1524,57 @@ entry:
define amdgpu_kernel void @system_one_as_acq_rel_fence() {
; GFX6-LABEL: system_one_as_acq_rel_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: system_one_as_acq_rel_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: system_one_as_acq_rel_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: system_one_as_acq_rel_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: system_one_as_acq_rel_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: system_one_as_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: system_one_as_acq_rel_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: system_one_as_acq_rel_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: system_one_as_acq_rel_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: system_one_as_acq_rel_fence:
@@ -1441,46 +1592,57 @@ entry:
define amdgpu_kernel void @system_one_as_seq_cst_fence() {
; GFX6-LABEL: system_one_as_seq_cst_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: system_one_as_seq_cst_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: system_one_as_seq_cst_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: system_one_as_seq_cst_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: system_one_as_seq_cst_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: system_one_as_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: system_one_as_seq_cst_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: system_one_as_seq_cst_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: system_one_as_seq_cst_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: system_one_as_seq_cst_fence:
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll
index 0e459ed0f1243..0bebcc9700793 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll
@@ -16,54 +16,67 @@
define amdgpu_kernel void @singlethread_acquire_fence() {
; GFX6-LABEL: singlethread_acquire_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: singlethread_acquire_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: singlethread_acquire_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: singlethread_acquire_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: singlethread_acquire_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: singlethread_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: singlethread_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: singlethread_acquire_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: singlethread_acquire_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: singlethread_acquire_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: singlethread_acquire_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: singlethread_acquire_fence:
; GFX12-WGP: ; %bb.0: ; %entry
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: singlethread_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("singlethread") acquire
@@ -73,46 +86,57 @@ entry:
define amdgpu_kernel void @singlethread_release_fence() {
; GFX6-LABEL: singlethread_release_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: singlethread_release_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: singlethread_release_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: singlethread_release_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: singlethread_release_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: singlethread_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: singlethread_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: singlethread_release_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: singlethread_release_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: singlethread_release_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: singlethread_release_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: singlethread_release_fence:
@@ -130,46 +154,57 @@ entry:
define amdgpu_kernel void @singlethread_acq_rel_fence() {
; GFX6-LABEL: singlethread_acq_rel_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: singlethread_acq_rel_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: singlethread_acq_rel_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: singlethread_acq_rel_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: singlethread_acq_rel_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: singlethread_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: singlethread_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: singlethread_acq_rel_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: singlethread_acq_rel_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: singlethread_acq_rel_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: singlethread_acq_rel_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: singlethread_acq_rel_fence:
@@ -187,46 +222,57 @@ entry:
define amdgpu_kernel void @singlethread_seq_cst_fence() {
; GFX6-LABEL: singlethread_seq_cst_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: singlethread_seq_cst_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: singlethread_seq_cst_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: singlethread_seq_cst_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: singlethread_seq_cst_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: singlethread_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: singlethread_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: singlethread_seq_cst_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: singlethread_seq_cst_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: singlethread_seq_cst_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: singlethread_seq_cst_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: singlethread_seq_cst_fence:
@@ -244,54 +290,67 @@ entry:
define amdgpu_kernel void @singlethread_one_as_acquire_fence() {
; GFX6-LABEL: singlethread_one_as_acquire_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: singlethread_one_as_acquire_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: singlethread_one_as_acquire_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: singlethread_one_as_acquire_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: singlethread_one_as_acquire_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: singlethread_one_as_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: singlethread_one_as_acquire_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: singlethread_one_as_acquire_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: singlethread_one_as_acquire_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: singlethread_one_as_acquire_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: singlethread_one_as_acquire_fence:
; GFX12-WGP: ; %bb.0: ; %entry
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: singlethread_one_as_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("singlethread-one-as") acquire
@@ -301,46 +360,57 @@ entry:
define amdgpu_kernel void @singlethread_one_as_release_fence() {
; GFX6-LABEL: singlethread_one_as_release_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: singlethread_one_as_release_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: singlethread_one_as_release_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: singlethread_one_as_release_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: singlethread_one_as_release_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: singlethread_one_as_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: singlethread_one_as_release_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: singlethread_one_as_release_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: singlethread_one_as_release_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: singlethread_one_as_release_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: singlethread_one_as_release_fence:
@@ -358,46 +428,57 @@ entry:
define amdgpu_kernel void @singlethread_one_as_acq_rel_fence() {
; GFX6-LABEL: singlethread_one_as_acq_rel_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: singlethread_one_as_acq_rel_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: singlethread_one_as_acq_rel_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: singlethread_one_as_acq_rel_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: singlethread_one_as_acq_rel_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: singlethread_one_as_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: singlethread_one_as_acq_rel_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: singlethread_one_as_acq_rel_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: singlethread_one_as_acq_rel_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: singlethread_one_as_acq_rel_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: singlethread_one_as_acq_rel_fence:
@@ -415,46 +496,57 @@ entry:
define amdgpu_kernel void @singlethread_one_as_seq_cst_fence() {
; GFX6-LABEL: singlethread_one_as_seq_cst_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: singlethread_one_as_seq_cst_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: singlethread_one_as_seq_cst_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: singlethread_one_as_seq_cst_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: singlethread_one_as_seq_cst_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: singlethread_one_as_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: singlethread_one_as_seq_cst_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: singlethread_one_as_seq_cst_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: singlethread_one_as_seq_cst_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: singlethread_one_as_seq_cst_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: singlethread_one_as_seq_cst_fence:
@@ -472,54 +564,67 @@ entry:
define amdgpu_kernel void @wavefront_acquire_fence() {
; GFX6-LABEL: wavefront_acquire_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: wavefront_acquire_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: wavefront_acquire_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: wavefront_acquire_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: wavefront_acquire_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: wavefront_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: wavefront_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: wavefront_acquire_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: wavefront_acquire_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: wavefront_acquire_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: wavefront_acquire_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: wavefront_acquire_fence:
; GFX12-WGP: ; %bb.0: ; %entry
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: wavefront_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("wavefront") acquire
@@ -529,46 +634,57 @@ entry:
define amdgpu_kernel void @wavefront_release_fence() {
; GFX6-LABEL: wavefront_release_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: wavefront_release_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: wavefront_release_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: wavefront_release_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: wavefront_release_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: wavefront_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: wavefront_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: wavefront_release_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: wavefront_release_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: wavefront_release_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: wavefront_release_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: wavefront_release_fence:
@@ -586,46 +702,57 @@ entry:
define amdgpu_kernel void @wavefront_acq_rel_fence() {
; GFX6-LABEL: wavefront_acq_rel_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: wavefront_acq_rel_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: wavefront_acq_rel_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: wavefront_acq_rel_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: wavefront_acq_rel_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: wavefront_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: wavefront_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: wavefront_acq_rel_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: wavefront_acq_rel_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: wavefront_acq_rel_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: wavefront_acq_rel_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: wavefront_acq_rel_fence:
@@ -643,46 +770,57 @@ entry:
define amdgpu_kernel void @wavefront_seq_cst_fence() {
; GFX6-LABEL: wavefront_seq_cst_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: wavefront_seq_cst_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: wavefront_seq_cst_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: wavefront_seq_cst_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: wavefront_seq_cst_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: wavefront_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: wavefront_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: wavefront_seq_cst_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: wavefront_seq_cst_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: wavefront_seq_cst_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: wavefront_seq_cst_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: wavefront_seq_cst_fence:
@@ -700,54 +838,67 @@ entry:
define amdgpu_kernel void @wavefront_one_as_acquire_fence() {
; GFX6-LABEL: wavefront_one_as_acquire_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: wavefront_one_as_acquire_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: wavefront_one_as_acquire_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: wavefront_one_as_acquire_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: wavefront_one_as_acquire_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: wavefront_one_as_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: wavefront_one_as_acquire_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: wavefront_one_as_acquire_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: wavefront_one_as_acquire_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: wavefront_one_as_acquire_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: wavefront_one_as_acquire_fence:
; GFX12-WGP: ; %bb.0: ; %entry
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: wavefront_one_as_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("wavefront-one-as") acquire
@@ -757,46 +908,57 @@ entry:
define amdgpu_kernel void @wavefront_one_as_release_fence() {
; GFX6-LABEL: wavefront_one_as_release_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: wavefront_one_as_release_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: wavefront_one_as_release_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: wavefront_one_as_release_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: wavefront_one_as_release_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: wavefront_one_as_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: wavefront_one_as_release_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: wavefront_one_as_release_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: wavefront_one_as_release_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: wavefront_one_as_release_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: wavefront_one_as_release_fence:
@@ -814,46 +976,57 @@ entry:
define amdgpu_kernel void @wavefront_one_as_acq_rel_fence() {
; GFX6-LABEL: wavefront_one_as_acq_rel_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: wavefront_one_as_acq_rel_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: wavefront_one_as_acq_rel_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: wavefront_one_as_acq_rel_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: wavefront_one_as_acq_rel_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: wavefront_one_as_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: wavefront_one_as_acq_rel_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: wavefront_one_as_acq_rel_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: wavefront_one_as_acq_rel_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: wavefront_one_as_acq_rel_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: wavefront_one_as_acq_rel_fence:
@@ -871,46 +1044,57 @@ entry:
define amdgpu_kernel void @wavefront_one_as_seq_cst_fence() {
; GFX6-LABEL: wavefront_one_as_seq_cst_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: wavefront_one_as_seq_cst_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: wavefront_one_as_seq_cst_fence:
; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: wavefront_one_as_seq_cst_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: wavefront_one_as_seq_cst_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: wavefront_one_as_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: wavefront_one_as_seq_cst_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: wavefront_one_as_seq_cst_fence:
; GFX942-TGSPLIT: ; %bb.0: ; %entry
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: wavefront_one_as_seq_cst_fence:
; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: wavefront_one_as_seq_cst_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: wavefront_one_as_seq_cst_fence:
@@ -996,7 +1180,7 @@ define amdgpu_kernel void @workgroup_acquire_fence() {
;
; GFX12-CU-LABEL: workgroup_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup") acquire
@@ -1071,7 +1255,7 @@ define amdgpu_kernel void @workgroup_release_fence() {
;
; GFX12-CU-LABEL: workgroup_release_fence:
; GFX12-CU: ; %bb.0: ; %entry
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup") release
@@ -1151,7 +1335,7 @@ define amdgpu_kernel void @workgroup_acq_rel_fence() {
;
; GFX12-CU-LABEL: workgroup_acq_rel_fence:
; GFX12-CU: ; %bb.0: ; %entry
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup") acq_rel
@@ -1231,7 +1415,7 @@ define amdgpu_kernel void @workgroup_seq_cst_fence() {
;
; GFX12-CU-LABEL: workgroup_seq_cst_fence:
; GFX12-CU: ; %bb.0: ; %entry
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup") seq_cst
@@ -1241,10 +1425,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
; GFX6-LABEL: workgroup_one_as_acquire_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_acquire_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_acquire_fence:
@@ -1256,14 +1442,17 @@ define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
;
; GFX10-CU-LABEL: workgroup_one_as_acquire_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_acquire_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
@@ -1274,6 +1463,7 @@ define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
@@ -1291,6 +1481,7 @@ define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
;
; GFX11-CU-LABEL: workgroup_one_as_acquire_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_acquire_fence:
@@ -1302,6 +1493,7 @@ define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
;
; GFX12-CU-LABEL: workgroup_one_as_acquire_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") acquire
@@ -1311,10 +1503,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_release_fence() {
; GFX6-LABEL: workgroup_one_as_release_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_release_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_release_fence:
@@ -1325,14 +1519,17 @@ define amdgpu_kernel void @workgroup_one_as_release_fence() {
;
; GFX10-CU-LABEL: workgroup_one_as_release_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_release_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_release_fence:
@@ -1342,6 +1539,7 @@ define amdgpu_kernel void @workgroup_one_as_release_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_release_fence:
@@ -1357,6 +1555,7 @@ define amdgpu_kernel void @workgroup_one_as_release_fence() {
;
; GFX11-CU-LABEL: workgroup_one_as_release_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_release_fence:
@@ -1369,6 +1568,7 @@ define amdgpu_kernel void @workgroup_one_as_release_fence() {
;
; GFX12-CU-LABEL: workgroup_one_as_release_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") release
@@ -1378,10 +1578,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
; GFX6-LABEL: workgroup_one_as_acq_rel_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_acq_rel_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_acq_rel_fence:
@@ -1393,14 +1595,17 @@ define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
;
; GFX10-CU-LABEL: workgroup_one_as_acq_rel_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_acq_rel_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
@@ -1411,6 +1616,7 @@ define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
@@ -1428,6 +1634,7 @@ define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
;
; GFX11-CU-LABEL: workgroup_one_as_acq_rel_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_acq_rel_fence:
@@ -1441,6 +1648,7 @@ define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
;
; GFX12-CU-LABEL: workgroup_one_as_acq_rel_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") acq_rel
@@ -1450,10 +1658,12 @@ entry:
define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
; GFX6-LABEL: workgroup_one_as_seq_cst_fence:
; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: workgroup_one_as_seq_cst_fence:
; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: workgroup_one_as_seq_cst_fence:
@@ -1465,14 +1675,17 @@ define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
;
; GFX10-CU-LABEL: workgroup_one_as_seq_cst_fence:
; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: workgroup_one_as_seq_cst_fence:
; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
@@ -1483,6 +1696,7 @@ define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
;
; GFX942-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
; GFX942-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
@@ -1500,6 +1714,7 @@ define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
;
; GFX11-CU-LABEL: workgroup_one_as_seq_cst_fence:
; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: workgroup_one_as_seq_cst_fence:
@@ -1513,6 +1728,7 @@ define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
;
; GFX12-CU-LABEL: workgroup_one_as_seq_cst_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") seq_cst
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll
index 07ad8cb0c4a3d..69fdf19780c63 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll
@@ -1747,7 +1747,8 @@ define amdgpu_kernel void @flat_agent_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -1760,7 +1761,8 @@ define amdgpu_kernel void @flat_agent_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
@@ -2120,7 +2122,8 @@ define amdgpu_kernel void @flat_agent_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -2137,7 +2140,8 @@ define amdgpu_kernel void @flat_agent_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
@@ -2324,7 +2328,8 @@ define amdgpu_kernel void @flat_agent_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -2341,7 +2346,8 @@ define amdgpu_kernel void @flat_agent_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
@@ -3513,7 +3519,8 @@ define amdgpu_kernel void @flat_agent_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3530,7 +3537,8 @@ define amdgpu_kernel void @flat_agent_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -4064,7 +4072,8 @@ define amdgpu_kernel void @flat_agent_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4085,7 +4094,8 @@ define amdgpu_kernel void @flat_agent_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -4357,7 +4367,8 @@ define amdgpu_kernel void @flat_agent_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4378,7 +4389,8 @@ define amdgpu_kernel void @flat_agent_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -4630,7 +4642,8 @@ define amdgpu_kernel void @flat_agent_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4647,7 +4660,8 @@ define amdgpu_kernel void @flat_agent_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -4899,7 +4913,8 @@ define amdgpu_kernel void @flat_agent_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4916,7 +4931,8 @@ define amdgpu_kernel void @flat_agent_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -5188,7 +5204,8 @@ define amdgpu_kernel void @flat_agent_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5209,7 +5226,8 @@ define amdgpu_kernel void @flat_agent_release_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -5481,7 +5499,8 @@ define amdgpu_kernel void @flat_agent_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5502,7 +5521,8 @@ define amdgpu_kernel void @flat_agent_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -5774,7 +5794,8 @@ define amdgpu_kernel void @flat_agent_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5795,7 +5816,8 @@ define amdgpu_kernel void @flat_agent_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -6067,7 +6089,8 @@ define amdgpu_kernel void @flat_agent_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -6088,7 +6111,8 @@ define amdgpu_kernel void @flat_agent_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -6360,7 +6384,8 @@ define amdgpu_kernel void @flat_agent_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -6381,7 +6406,8 @@ define amdgpu_kernel void @flat_agent_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -6653,7 +6679,8 @@ define amdgpu_kernel void @flat_agent_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -6674,7 +6701,8 @@ define amdgpu_kernel void @flat_agent_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -6946,7 +6974,8 @@ define amdgpu_kernel void @flat_agent_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -6967,7 +6996,8 @@ define amdgpu_kernel void @flat_agent_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -7239,7 +7269,8 @@ define amdgpu_kernel void @flat_agent_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
;
@@ -7260,7 +7291,8 @@ define amdgpu_kernel void @flat_agent_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -13656,6 +13688,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -13674,6 +13707,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -13752,6 +13786,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -13766,6 +13801,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -13780,6 +13816,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -13793,6 +13830,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -14008,6 +14046,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -14028,6 +14067,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -14115,6 +14155,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -14131,6 +14172,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -14149,6 +14191,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -14166,6 +14209,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -14208,6 +14252,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -14228,6 +14273,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -14315,6 +14361,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -14331,6 +14378,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -14349,6 +14397,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -14366,6 +14415,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -15384,6 +15434,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -15416,6 +15467,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -15528,6 +15580,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -15546,6 +15599,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -15564,6 +15618,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -15581,6 +15636,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -15914,6 +15970,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -15948,6 +16005,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16069,6 +16127,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16089,6 +16148,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16111,6 +16171,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -16132,6 +16193,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -16203,6 +16265,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16237,6 +16300,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16358,6 +16422,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16378,6 +16443,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16400,6 +16466,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -16421,6 +16488,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -16489,6 +16557,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16521,6 +16590,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16633,6 +16703,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16651,6 +16722,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16669,6 +16741,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -16686,6 +16759,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -16754,6 +16828,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16786,6 +16861,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16898,6 +16974,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16916,6 +16993,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16934,6 +17012,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -16951,6 +17030,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -17022,6 +17102,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -17056,6 +17137,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -17177,6 +17259,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -17197,6 +17280,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -17219,6 +17303,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -17240,6 +17325,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -17311,6 +17397,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -17345,6 +17432,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -17466,6 +17554,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -17486,6 +17575,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -17508,6 +17598,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -17529,6 +17620,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -17600,6 +17692,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -17634,6 +17727,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -17755,6 +17849,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -17775,6 +17870,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -17797,6 +17893,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -17818,6 +17915,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -17889,6 +17987,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -17923,6 +18022,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -18044,6 +18144,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -18064,6 +18165,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -18086,6 +18188,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -18107,6 +18210,7 @@ define amdgpu_kernel void @flat_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -18178,6 +18282,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -18212,6 +18317,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -18333,6 +18439,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -18353,6 +18460,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -18375,6 +18483,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -18396,6 +18505,7 @@ define amdgpu_kernel void @flat_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -18467,6 +18577,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -18501,6 +18612,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -18622,6 +18734,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -18642,6 +18755,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -18664,6 +18778,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -18685,6 +18800,7 @@ define amdgpu_kernel void @flat_agent_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -18756,6 +18872,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -18790,6 +18907,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -18911,6 +19029,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -18931,6 +19050,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -18953,6 +19073,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -18974,6 +19095,7 @@ define amdgpu_kernel void @flat_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -19045,6 +19167,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -19079,6 +19202,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -19200,6 +19324,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -19220,6 +19345,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -19242,6 +19368,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -19263,6 +19390,7 @@ define amdgpu_kernel void @flat_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-singlethread.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-singlethread.ll
index b88a10ab24a98..c97a7174c7186 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-singlethread.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-singlethread.ll
@@ -388,6 +388,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -406,6 +407,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -424,6 +426,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -438,6 +441,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_load_dword v2, v[0:1]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -453,6 +457,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -467,6 +472,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -479,6 +485,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -491,6 +498,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -504,6 +512,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -518,6 +527,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -532,6 +542,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -546,6 +557,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -569,7 +581,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -587,7 +601,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -605,7 +621,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -619,7 +637,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_load_dword v2, v[0:1]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -634,7 +654,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -648,7 +670,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -660,7 +684,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -672,7 +698,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; GFX942-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -685,7 +713,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -699,7 +729,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -713,7 +745,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -727,7 +761,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -1050,6 +1086,7 @@ define amdgpu_kernel void @flat_singlethread_release_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1065,6 +1102,7 @@ define amdgpu_kernel void @flat_singlethread_release_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_store_dword v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1080,6 +1118,7 @@ define amdgpu_kernel void @flat_singlethread_release_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -1091,6 +1130,7 @@ define amdgpu_kernel void @flat_singlethread_release_store(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1103,6 +1143,7 @@ define amdgpu_kernel void @flat_singlethread_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1115,6 +1156,7 @@ define amdgpu_kernel void @flat_singlethread_release_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1125,6 +1167,7 @@ define amdgpu_kernel void @flat_singlethread_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1135,6 +1178,7 @@ define amdgpu_kernel void @flat_singlethread_release_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1146,6 +1190,7 @@ define amdgpu_kernel void @flat_singlethread_release_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1157,6 +1202,7 @@ define amdgpu_kernel void @flat_singlethread_release_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -1199,6 +1245,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1214,6 +1261,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_store_dword v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1229,6 +1277,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -1240,6 +1289,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_store(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1252,6 +1302,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1264,6 +1315,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1274,6 +1326,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1284,6 +1337,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1295,6 +1349,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1306,6 +1361,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -1498,6 +1554,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_acquire_atomicrmw:
@@ -1513,6 +1570,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_acquire_atomicrmw:
@@ -1528,6 +1586,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_acquire_atomicrmw:
@@ -1539,6 +1598,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_acquire_atomicrmw:
@@ -1551,6 +1611,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_acquire_atomicrmw:
@@ -1563,6 +1624,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_acquire_atomicrmw:
@@ -1573,6 +1635,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_acquire_atomicrmw:
@@ -1583,6 +1646,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_acquire_atomicrmw:
@@ -1594,6 +1658,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_acquire_atomicrmw:
@@ -1605,6 +1670,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_acquire_atomicrmw:
@@ -1616,6 +1682,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_acquire_atomicrmw:
@@ -1627,6 +1694,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -1646,6 +1714,7 @@ define amdgpu_kernel void @flat_singlethread_release_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1661,6 +1730,7 @@ define amdgpu_kernel void @flat_singlethread_release_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1676,6 +1746,7 @@ define amdgpu_kernel void @flat_singlethread_release_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -1687,6 +1758,7 @@ define amdgpu_kernel void @flat_singlethread_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1699,6 +1771,7 @@ define amdgpu_kernel void @flat_singlethread_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1711,6 +1784,7 @@ define amdgpu_kernel void @flat_singlethread_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1721,6 +1795,7 @@ define amdgpu_kernel void @flat_singlethread_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1731,6 +1806,7 @@ define amdgpu_kernel void @flat_singlethread_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1742,6 +1818,7 @@ define amdgpu_kernel void @flat_singlethread_release_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1753,6 +1830,7 @@ define amdgpu_kernel void @flat_singlethread_release_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -1795,7 +1873,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_acq_rel_atomicrmw:
@@ -1810,7 +1890,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_acq_rel_atomicrmw:
@@ -1825,7 +1907,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_acq_rel_atomicrmw:
@@ -1836,7 +1920,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_atomicrmw:
@@ -1848,7 +1934,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_acq_rel_atomicrmw:
@@ -1860,7 +1948,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_atomicrmw:
@@ -1870,7 +1960,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_acq_rel_atomicrmw:
@@ -1880,7 +1972,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_acq_rel_atomicrmw:
@@ -1891,7 +1985,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_acq_rel_atomicrmw:
@@ -1902,7 +1998,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_acq_rel_atomicrmw:
@@ -1914,6 +2012,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_acq_rel_atomicrmw:
@@ -1925,6 +2024,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -1944,7 +2044,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_seq_cst_atomicrmw:
@@ -1959,7 +2061,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_seq_cst_atomicrmw:
@@ -1974,7 +2078,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_seq_cst_atomicrmw:
@@ -1985,7 +2091,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_atomicrmw:
@@ -1997,7 +2105,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_seq_cst_atomicrmw:
@@ -2009,7 +2119,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_atomicrmw:
@@ -2019,7 +2131,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_seq_cst_atomicrmw:
@@ -2029,7 +2143,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_seq_cst_atomicrmw:
@@ -2040,7 +2156,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_seq_cst_atomicrmw:
@@ -2051,7 +2169,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_seq_cst_atomicrmw:
@@ -2063,6 +2183,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_seq_cst_atomicrmw:
@@ -2074,6 +2195,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -2094,6 +2216,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2113,6 +2236,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2132,6 +2256,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2147,6 +2272,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2163,6 +2289,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2178,6 +2305,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2191,6 +2319,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2204,6 +2333,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2218,6 +2348,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2233,6 +2364,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2248,6 +2380,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2263,6 +2396,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2287,7 +2421,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2306,7 +2442,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2325,7 +2463,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2340,7 +2480,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2356,7 +2498,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2371,7 +2515,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2384,7 +2530,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2397,7 +2545,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2411,7 +2561,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2426,7 +2578,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2442,6 +2596,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2457,6 +2612,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2481,7 +2637,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2500,7 +2658,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2519,7 +2679,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2534,7 +2696,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2550,7 +2714,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2565,7 +2731,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2578,7 +2746,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2591,7 +2761,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2605,7 +2777,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2620,7 +2794,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2636,6 +2812,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2651,6 +2828,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2928,6 +3106,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
@@ -2957,6 +3136,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
@@ -2986,6 +3166,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
@@ -3011,6 +3192,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
@@ -3027,6 +3209,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
@@ -3043,6 +3226,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
@@ -3057,6 +3241,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
@@ -3071,6 +3256,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
@@ -3086,6 +3272,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
@@ -3101,6 +3288,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
@@ -3116,6 +3304,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
@@ -3131,6 +3320,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -3165,6 +3355,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX7-NEXT: s_endpgm
;
@@ -3194,6 +3385,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -3223,6 +3415,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX10-CU-NEXT: s_endpgm
;
@@ -3248,6 +3441,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -3264,6 +3458,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -3280,6 +3475,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3294,6 +3490,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -3308,6 +3505,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3323,6 +3521,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -3338,6 +3537,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -3403,7 +3603,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3432,7 +3634,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3461,7 +3665,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3486,7 +3692,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3502,7 +3710,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3518,7 +3728,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3532,7 +3744,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3546,7 +3760,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3561,7 +3777,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3576,7 +3794,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3592,6 +3812,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3607,6 +3828,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -3641,7 +3863,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3670,7 +3894,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3699,7 +3925,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3724,7 +3952,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3740,7 +3970,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3756,7 +3988,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3770,7 +4004,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3784,7 +4020,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3799,7 +4037,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3814,7 +4054,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3830,6 +4072,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3845,6 +4088,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -3880,6 +4124,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
@@ -3909,6 +4154,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
@@ -3938,6 +4184,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
@@ -3963,6 +4210,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
@@ -3979,6 +4227,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
@@ -3995,6 +4244,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
@@ -4009,6 +4259,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
@@ -4023,6 +4274,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
@@ -4038,6 +4290,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
@@ -4053,6 +4306,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
@@ -4068,6 +4322,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
@@ -4083,6 +4338,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4118,6 +4374,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
@@ -4147,6 +4404,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
@@ -4176,6 +4434,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
@@ -4201,6 +4460,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
@@ -4217,6 +4477,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
@@ -4233,6 +4494,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
@@ -4247,6 +4509,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
@@ -4261,6 +4524,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
@@ -4276,6 +4540,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
@@ -4291,6 +4556,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
@@ -4306,6 +4572,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
@@ -4321,6 +4588,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4355,7 +4623,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_release_acquire_cmpxchg:
@@ -4384,7 +4654,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_release_acquire_cmpxchg:
@@ -4413,7 +4685,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_release_acquire_cmpxchg:
@@ -4438,7 +4712,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_release_acquire_cmpxchg:
@@ -4454,7 +4730,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_release_acquire_cmpxchg:
@@ -4470,7 +4748,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_release_acquire_cmpxchg:
@@ -4484,7 +4764,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_release_acquire_cmpxchg:
@@ -4498,7 +4780,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_release_acquire_cmpxchg:
@@ -4513,7 +4797,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_release_acquire_cmpxchg:
@@ -4528,7 +4814,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_release_acquire_cmpxchg:
@@ -4544,6 +4832,7 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_release_acquire_cmpxchg:
@@ -4559,6 +4848,7 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4593,7 +4883,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
@@ -4622,7 +4914,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
@@ -4651,7 +4945,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
@@ -4676,7 +4972,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
@@ -4692,7 +4990,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
@@ -4708,7 +5008,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
@@ -4722,7 +5024,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
@@ -4736,7 +5040,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
@@ -4751,7 +5057,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
@@ -4766,7 +5074,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
@@ -4782,6 +5092,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
@@ -4797,6 +5108,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4831,7 +5143,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
@@ -4860,7 +5174,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
@@ -4889,7 +5205,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
@@ -4914,7 +5232,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
@@ -4930,7 +5250,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
@@ -4946,7 +5268,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
@@ -4960,7 +5284,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
@@ -4974,7 +5300,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
@@ -4989,7 +5317,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
@@ -5004,7 +5334,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
@@ -5020,6 +5352,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
@@ -5035,6 +5368,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -5069,7 +5403,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5098,7 +5434,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5127,7 +5465,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5152,7 +5492,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5168,7 +5510,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5184,7 +5528,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5198,7 +5544,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5212,7 +5560,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5227,7 +5577,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5242,7 +5594,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5258,6 +5612,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5273,6 +5628,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -5307,7 +5663,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
@@ -5336,7 +5694,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
@@ -5365,7 +5725,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
@@ -5390,7 +5752,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
@@ -5406,7 +5770,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
@@ -5422,7 +5788,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
@@ -5436,7 +5804,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
@@ -5450,7 +5820,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
@@ -5465,7 +5837,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
@@ -5480,7 +5854,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
@@ -5496,6 +5872,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
@@ -5511,6 +5888,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -5545,7 +5923,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
@@ -5574,7 +5954,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
@@ -5603,7 +5985,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
@@ -5628,7 +6012,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
@@ -5644,7 +6030,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
@@ -5660,7 +6048,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
@@ -5674,7 +6064,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
@@ -5688,7 +6080,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
@@ -5703,7 +6097,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
@@ -5718,7 +6114,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
@@ -5734,6 +6132,7 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
@@ -5749,6 +6148,7 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -5783,7 +6183,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5812,7 +6214,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5841,7 +6245,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5866,7 +6272,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5882,7 +6290,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5898,7 +6308,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5912,7 +6324,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5926,7 +6340,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5941,7 +6357,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5956,7 +6374,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5972,6 +6392,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5987,6 +6408,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -6021,7 +6443,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -6050,7 +6474,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -6079,7 +6505,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -6104,7 +6532,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -6120,7 +6550,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -6136,7 +6568,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -6150,7 +6584,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -6164,7 +6600,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -6179,7 +6617,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -6194,7 +6634,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -6210,6 +6652,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -6225,6 +6668,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -6544,6 +6988,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6577,6 +7022,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6610,6 +7056,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6639,6 +7086,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6659,6 +7107,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -6678,6 +7127,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -6695,6 +7145,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -6712,6 +7163,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -6730,6 +7182,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6749,6 +7202,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6768,6 +7222,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -6787,6 +7242,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -6827,6 +7283,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
@@ -6860,6 +7317,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
@@ -6893,6 +7351,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
@@ -6922,6 +7381,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
@@ -6942,6 +7402,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6961,6 +7422,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
@@ -6978,6 +7440,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6995,6 +7458,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
@@ -7013,6 +7477,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
@@ -7032,6 +7497,7 @@ define amdgpu_kernel void @flat_singlethread_release_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
@@ -7111,7 +7577,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7144,7 +7612,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7177,7 +7647,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7206,7 +7678,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7226,7 +7700,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7245,7 +7721,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7262,7 +7740,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7279,7 +7759,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7297,7 +7779,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7316,7 +7800,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7336,6 +7822,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7355,6 +7842,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7395,7 +7883,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7428,7 +7918,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7461,7 +7953,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7490,7 +7984,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7510,7 +8006,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7529,7 +8027,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7546,7 +8046,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7563,7 +8065,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7581,7 +8085,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7600,7 +8106,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7620,6 +8128,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7639,6 +8148,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7680,6 +8190,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7713,6 +8224,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7746,6 +8258,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7775,6 +8288,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7795,6 +8309,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7814,6 +8329,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7831,6 +8347,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7848,6 +8365,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7866,6 +8384,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7885,6 +8404,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7904,6 +8424,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7923,6 +8444,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7964,6 +8486,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7997,6 +8520,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8030,6 +8554,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8059,6 +8584,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8079,6 +8605,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8098,6 +8625,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8115,6 +8643,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8132,6 +8661,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8150,6 +8680,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8169,6 +8700,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8188,6 +8720,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8207,6 +8740,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8247,7 +8781,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8280,7 +8816,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8313,7 +8851,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8342,7 +8882,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8362,7 +8904,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8381,7 +8925,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8398,7 +8944,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8415,7 +8963,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8433,7 +8983,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8452,7 +9004,9 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8472,6 +9026,7 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8491,6 +9046,7 @@ define amdgpu_kernel void @flat_singlethread_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8531,7 +9087,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8564,7 +9122,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8597,7 +9157,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8626,7 +9188,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8646,7 +9210,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8665,7 +9231,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8682,7 +9250,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8699,7 +9269,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8717,7 +9289,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8736,7 +9310,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8756,6 +9332,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8775,6 +9352,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8815,7 +9393,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8848,7 +9428,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8881,7 +9463,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8910,7 +9494,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8930,7 +9516,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8949,7 +9537,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8966,7 +9556,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8983,7 +9575,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9001,7 +9595,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9020,7 +9616,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9040,6 +9638,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9059,6 +9658,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9099,7 +9699,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9132,7 +9734,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9165,7 +9769,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9194,7 +9800,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9214,7 +9822,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9233,7 +9843,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9250,7 +9862,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9267,7 +9881,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9285,7 +9901,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9304,7 +9922,9 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9324,6 +9944,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9343,6 +9964,7 @@ define amdgpu_kernel void @flat_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9383,7 +10005,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9416,7 +10040,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9449,7 +10075,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9478,7 +10106,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9498,7 +10128,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9517,7 +10149,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9534,7 +10168,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9551,7 +10187,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9569,7 +10207,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9588,7 +10228,9 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9608,6 +10250,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9627,6 +10270,7 @@ define amdgpu_kernel void @flat_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9667,7 +10311,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9700,7 +10346,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9733,7 +10381,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9762,7 +10412,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9782,7 +10434,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9801,7 +10455,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9818,7 +10474,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9835,7 +10493,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9853,7 +10513,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9872,7 +10534,9 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9892,6 +10556,7 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9911,6 +10576,7 @@ define amdgpu_kernel void @flat_singlethread_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9951,7 +10617,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9984,7 +10652,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10017,7 +10687,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10046,7 +10718,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10066,7 +10740,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10085,7 +10761,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10102,7 +10780,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10119,7 +10799,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10137,7 +10819,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10156,7 +10840,9 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10176,6 +10862,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -10195,6 +10882,7 @@ define amdgpu_kernel void @flat_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -10235,7 +10923,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10268,7 +10958,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10301,7 +10993,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10330,7 +11024,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10350,7 +11046,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10369,7 +11067,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10386,7 +11086,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10403,7 +11105,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10421,7 +11125,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10440,7 +11146,9 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10460,6 +11168,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -10479,6 +11188,7 @@ define amdgpu_kernel void @flat_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -10869,6 +11579,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10887,6 +11598,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10905,6 +11617,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10919,6 +11632,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_load_dword v2, v[0:1]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10934,6 +11648,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10948,6 +11663,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10960,6 +11676,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10972,6 +11689,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10985,6 +11703,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10999,6 +11718,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11013,6 +11733,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -11027,6 +11748,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -11050,7 +11772,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11068,7 +11792,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11086,7 +11812,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11100,7 +11828,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_load_dword v2, v[0:1]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11115,7 +11845,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -11129,7 +11861,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -11141,7 +11875,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -11153,7 +11889,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; GFX942-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -11166,7 +11904,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11180,7 +11920,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11194,7 +11936,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -11208,7 +11952,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -11531,6 +12277,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -11546,6 +12293,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_store_dword v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -11561,6 +12309,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -11572,6 +12321,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_store(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11584,6 +12334,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11596,6 +12347,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11606,6 +12358,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11616,6 +12369,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -11627,6 +12381,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -11638,6 +12393,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -11680,6 +12436,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -11695,6 +12452,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_store_dword v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -11710,6 +12468,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -11721,6 +12480,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_store(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11733,6 +12493,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11745,6 +12506,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11755,6 +12517,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11765,6 +12528,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -11776,6 +12540,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -11787,6 +12552,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -11979,6 +12745,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
@@ -11994,6 +12761,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
@@ -12009,6 +12777,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
@@ -12020,6 +12789,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
@@ -12032,6 +12802,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
@@ -12044,6 +12815,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
@@ -12054,6 +12826,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
@@ -12064,6 +12837,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
@@ -12075,6 +12849,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
@@ -12086,6 +12861,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
@@ -12097,6 +12873,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
@@ -12108,6 +12885,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -12127,6 +12905,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -12142,6 +12921,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -12157,6 +12937,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -12168,6 +12949,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -12180,6 +12962,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -12192,6 +12975,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -12202,6 +12986,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -12212,6 +12997,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -12223,6 +13009,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -12234,6 +13021,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -12276,7 +13064,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
@@ -12291,7 +13081,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
@@ -12306,7 +13098,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
@@ -12317,7 +13111,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
@@ -12329,7 +13125,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
@@ -12341,7 +13139,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
@@ -12351,7 +13151,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
@@ -12361,7 +13163,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
@@ -12372,7 +13176,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
@@ -12383,7 +13189,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
@@ -12395,6 +13203,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
@@ -12406,6 +13215,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -12425,7 +13235,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
@@ -12440,7 +13252,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
@@ -12455,7 +13269,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
@@ -12466,7 +13282,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
@@ -12478,7 +13296,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
@@ -12490,7 +13310,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
@@ -12500,7 +13322,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
@@ -12510,7 +13334,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
@@ -12521,7 +13347,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
@@ -12532,7 +13360,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
@@ -12544,6 +13374,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
@@ -12555,6 +13386,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -12575,6 +13407,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12594,6 +13427,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12613,6 +13447,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12628,6 +13463,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12644,6 +13480,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12659,6 +13496,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12672,6 +13510,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12685,6 +13524,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12699,6 +13539,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12714,6 +13555,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12729,6 +13571,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -12744,6 +13587,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -12768,7 +13612,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12787,7 +13633,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12806,7 +13654,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12821,7 +13671,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12837,7 +13689,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12852,7 +13706,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12865,7 +13721,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12878,7 +13736,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12892,7 +13752,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12907,7 +13769,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12923,6 +13787,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -12938,6 +13803,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -12962,7 +13828,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12981,7 +13849,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13000,7 +13870,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13015,7 +13887,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13031,7 +13905,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -13046,7 +13922,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -13059,7 +13937,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -13072,7 +13952,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -13086,7 +13968,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13101,7 +13985,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13117,6 +14003,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -13132,6 +14019,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -13409,6 +14297,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -13438,6 +14327,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -13467,6 +14357,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -13492,6 +14383,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -13508,6 +14400,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -13524,6 +14417,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -13538,6 +14432,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -13552,6 +14447,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -13567,6 +14463,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -13582,6 +14479,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -13597,6 +14495,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -13612,6 +14511,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -13646,6 +14546,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX7-NEXT: s_endpgm
;
@@ -13675,6 +14576,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -13704,6 +14606,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX10-CU-NEXT: s_endpgm
;
@@ -13729,6 +14632,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -13745,6 +14649,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -13761,6 +14666,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -13775,6 +14681,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -13789,6 +14696,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -13804,6 +14712,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -13819,6 +14728,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -13884,7 +14794,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13913,7 +14825,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13942,7 +14856,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13967,7 +14883,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13983,7 +14901,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13999,7 +14919,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -14013,7 +14935,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -14027,7 +14951,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -14042,7 +14968,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -14057,7 +14985,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -14073,6 +15003,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -14088,6 +15019,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -14122,7 +15054,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -14151,7 +15085,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -14180,7 +15116,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -14205,7 +15143,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -14221,7 +15161,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -14237,7 +15179,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -14251,7 +15195,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -14265,7 +15211,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -14280,7 +15228,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -14295,7 +15245,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -14311,6 +15263,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -14326,6 +15279,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -14361,6 +15315,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -14390,6 +15345,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -14419,6 +15375,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -14444,6 +15401,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -14460,6 +15418,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -14476,6 +15435,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -14490,6 +15450,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -14504,6 +15465,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -14519,6 +15481,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -14534,6 +15497,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -14549,6 +15513,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -14564,6 +15529,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -14599,6 +15565,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -14628,6 +15595,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -14657,6 +15625,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -14682,6 +15651,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -14698,6 +15668,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -14714,6 +15685,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -14728,6 +15700,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -14742,6 +15715,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -14757,6 +15731,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -14772,6 +15747,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -14787,6 +15763,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -14802,6 +15779,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -14836,7 +15814,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
@@ -14865,7 +15845,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
@@ -14894,7 +15876,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
@@ -14919,7 +15903,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
@@ -14935,7 +15921,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
@@ -14951,7 +15939,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
@@ -14965,7 +15955,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
@@ -14979,7 +15971,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
@@ -14994,7 +15988,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
@@ -15009,7 +16005,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
@@ -15025,6 +16023,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
@@ -15040,6 +16039,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15074,7 +16074,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -15103,7 +16105,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -15132,7 +16136,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -15157,7 +16163,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -15173,7 +16181,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -15189,7 +16199,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -15203,7 +16215,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -15217,7 +16231,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -15232,7 +16248,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -15247,7 +16265,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -15263,6 +16283,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -15278,6 +16299,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15312,7 +16334,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -15341,7 +16365,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -15370,7 +16396,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -15395,7 +16423,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -15411,7 +16441,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -15427,7 +16459,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -15441,7 +16475,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -15455,7 +16491,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -15470,7 +16508,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -15485,7 +16525,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -15501,6 +16543,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -15516,6 +16559,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15550,7 +16594,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -15579,7 +16625,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -15608,7 +16656,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -15633,7 +16683,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -15649,7 +16701,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -15665,7 +16719,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -15679,7 +16735,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -15693,7 +16751,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -15708,7 +16768,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -15723,7 +16785,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -15739,6 +16803,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -15754,6 +16819,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15788,7 +16854,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15817,7 +16885,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15846,7 +16916,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15871,7 +16943,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15887,7 +16961,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15903,7 +16979,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15917,7 +16995,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15931,7 +17011,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15946,7 +17028,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15961,7 +17045,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15977,6 +17063,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15992,6 +17079,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -16026,7 +17114,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -16055,7 +17145,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -16084,7 +17176,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -16109,7 +17203,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -16125,7 +17221,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -16141,7 +17239,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -16155,7 +17255,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -16169,7 +17271,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -16184,7 +17288,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -16199,7 +17305,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -16215,6 +17323,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -16230,6 +17339,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -16264,7 +17374,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16293,7 +17405,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16322,7 +17436,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16347,7 +17463,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16363,7 +17481,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16379,7 +17499,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16393,7 +17515,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16407,7 +17531,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16422,7 +17548,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16437,7 +17565,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16453,6 +17583,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16468,6 +17599,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -16502,7 +17634,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16531,7 +17665,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16560,7 +17696,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16585,7 +17723,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16601,7 +17741,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16617,7 +17759,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16631,7 +17775,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16645,7 +17791,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16660,7 +17808,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16675,7 +17825,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16691,6 +17843,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16706,6 +17859,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -17025,6 +18179,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17058,6 +18213,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17091,6 +18247,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17120,6 +18277,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17140,6 +18298,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17159,6 +18318,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17176,6 +18336,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17193,6 +18354,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17211,6 +18373,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17230,6 +18393,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17249,6 +18413,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -17268,6 +18433,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_monotonic_ret_cmpxch
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -17308,6 +18474,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
@@ -17341,6 +18508,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
@@ -17374,6 +18542,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
@@ -17403,6 +18572,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
@@ -17423,6 +18593,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17442,6 +18613,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
@@ -17459,6 +18631,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17476,6 +18649,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_ret_cmpxch
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
@@ -17494,6 +18668,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
@@ -17513,6 +18688,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_monotonic_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
@@ -17592,7 +18768,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17625,7 +18803,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17658,7 +18838,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17687,7 +18869,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17707,7 +18891,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17726,7 +18912,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17743,7 +18931,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17760,7 +18950,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17778,7 +18970,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17797,7 +18991,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17817,6 +19013,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -17836,6 +19033,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxch
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -17876,7 +19074,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17909,7 +19109,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17942,7 +19144,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17971,7 +19175,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17991,7 +19197,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18010,7 +19218,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18027,7 +19237,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18044,7 +19256,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18062,7 +19276,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18081,7 +19297,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18101,6 +19319,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18120,6 +19339,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxch
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18161,6 +19381,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18194,6 +19415,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18227,6 +19449,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18256,6 +19479,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18276,6 +19500,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18295,6 +19520,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18312,6 +19538,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18329,6 +19556,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18347,6 +19575,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18366,6 +19595,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18385,6 +19615,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18404,6 +19635,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_acquire_ret_cmpxch
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18445,6 +19677,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18478,6 +19711,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18511,6 +19745,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18540,6 +19775,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18560,6 +19796,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18579,6 +19816,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18596,6 +19834,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18613,6 +19852,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18631,6 +19871,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18650,6 +19891,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18669,6 +19911,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18688,6 +19931,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18728,7 +19972,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18761,7 +20007,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18794,7 +20042,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18823,7 +20073,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18843,7 +20095,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18862,7 +20116,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18879,7 +20135,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18896,7 +20154,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18914,7 +20174,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18933,7 +20195,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18953,6 +20217,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18972,6 +20237,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19012,7 +20278,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19045,7 +20313,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19078,7 +20348,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19107,7 +20379,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19127,7 +20401,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19146,7 +20422,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19163,7 +20441,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19180,7 +20460,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19198,7 +20480,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19217,7 +20501,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19237,6 +20523,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19256,6 +20543,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19296,7 +20584,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19329,7 +20619,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19362,7 +20654,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19391,7 +20685,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19411,7 +20707,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19430,7 +20728,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19447,7 +20747,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19464,7 +20766,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19482,7 +20786,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19501,7 +20807,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19521,6 +20829,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19540,6 +20849,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19580,7 +20890,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19613,7 +20925,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19646,7 +20960,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19675,7 +20991,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19695,7 +21013,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19714,7 +21034,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19731,7 +21053,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19748,7 +21072,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19766,7 +21092,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19785,7 +21113,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19805,6 +21135,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19824,6 +21155,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxch
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19864,7 +21196,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19897,7 +21231,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19930,7 +21266,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19959,7 +21297,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19979,7 +21319,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19998,7 +21340,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20015,7 +21359,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20032,7 +21378,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20050,7 +21398,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20069,7 +21419,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20089,6 +21441,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20108,6 +21461,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20148,7 +21502,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20181,7 +21537,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20214,7 +21572,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20243,7 +21603,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20263,7 +21625,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20282,7 +21646,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20299,7 +21665,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20316,7 +21684,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20334,7 +21704,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20353,7 +21725,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20373,6 +21747,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20392,6 +21767,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20432,7 +21808,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20465,7 +21843,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20498,7 +21878,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20527,7 +21909,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20547,7 +21931,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20566,7 +21952,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20583,7 +21971,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20600,7 +21990,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20618,7 +22010,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20637,7 +22031,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20657,6 +22053,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20676,6 +22073,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20716,7 +22114,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20749,7 +22149,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20782,7 +22184,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20811,7 +22215,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20831,7 +22237,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20850,7 +22258,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20867,7 +22277,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20884,7 +22296,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20902,7 +22316,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20921,7 +22337,9 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20941,6 +22359,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20960,6 +22379,7 @@ define amdgpu_kernel void @flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll
index 919fc3e8f4e4f..6907ef8cd7c26 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll
@@ -1761,7 +1761,8 @@ define amdgpu_kernel void @flat_system_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -1774,7 +1775,8 @@ define amdgpu_kernel void @flat_system_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
@@ -2143,7 +2145,8 @@ define amdgpu_kernel void @flat_system_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -2161,7 +2164,8 @@ define amdgpu_kernel void @flat_system_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
@@ -2353,7 +2357,8 @@ define amdgpu_kernel void @flat_system_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -2371,7 +2376,8 @@ define amdgpu_kernel void @flat_system_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
@@ -3559,7 +3565,8 @@ define amdgpu_kernel void @flat_system_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3576,7 +3583,8 @@ define amdgpu_kernel void @flat_system_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -4119,7 +4127,8 @@ define amdgpu_kernel void @flat_system_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4141,7 +4150,8 @@ define amdgpu_kernel void @flat_system_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -4418,7 +4428,8 @@ define amdgpu_kernel void @flat_system_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4440,7 +4451,8 @@ define amdgpu_kernel void @flat_system_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -4694,7 +4706,8 @@ define amdgpu_kernel void @flat_system_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4711,7 +4724,8 @@ define amdgpu_kernel void @flat_system_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -4965,7 +4979,8 @@ define amdgpu_kernel void @flat_system_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4982,7 +4997,8 @@ define amdgpu_kernel void @flat_system_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -5259,7 +5275,8 @@ define amdgpu_kernel void @flat_system_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5281,7 +5298,8 @@ define amdgpu_kernel void @flat_system_release_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -5558,7 +5576,8 @@ define amdgpu_kernel void @flat_system_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5580,7 +5599,8 @@ define amdgpu_kernel void @flat_system_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -5857,7 +5877,8 @@ define amdgpu_kernel void @flat_system_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5879,7 +5900,8 @@ define amdgpu_kernel void @flat_system_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -6156,7 +6178,8 @@ define amdgpu_kernel void @flat_system_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -6178,7 +6201,8 @@ define amdgpu_kernel void @flat_system_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -6455,7 +6479,8 @@ define amdgpu_kernel void @flat_system_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -6477,7 +6502,8 @@ define amdgpu_kernel void @flat_system_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -6754,7 +6780,8 @@ define amdgpu_kernel void @flat_system_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -6776,7 +6803,8 @@ define amdgpu_kernel void @flat_system_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -7053,7 +7081,8 @@ define amdgpu_kernel void @flat_system_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -7075,7 +7104,8 @@ define amdgpu_kernel void @flat_system_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -7352,7 +7382,8 @@ define amdgpu_kernel void @flat_system_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
;
@@ -7374,7 +7405,8 @@ define amdgpu_kernel void @flat_system_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
-; GFX12-CU-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_storecnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -13852,6 +13884,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -13870,6 +13903,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -13950,6 +13984,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -13964,6 +13999,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -13978,6 +14014,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -13991,6 +14028,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -14210,6 +14248,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -14230,6 +14269,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -14321,6 +14361,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -14337,6 +14378,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -14356,6 +14398,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -14374,6 +14417,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -14416,6 +14460,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -14436,6 +14481,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -14527,6 +14573,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -14543,6 +14590,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -14562,6 +14610,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -14580,6 +14629,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -15612,6 +15662,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -15644,6 +15695,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -15758,6 +15810,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -15776,6 +15829,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -15794,6 +15848,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -15811,6 +15866,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -16148,6 +16204,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16182,6 +16239,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16307,6 +16365,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16327,6 +16386,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16350,6 +16410,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -16372,6 +16433,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -16443,6 +16505,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16477,6 +16540,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16602,6 +16666,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16622,6 +16687,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16645,6 +16711,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -16667,6 +16734,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -16735,6 +16803,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16767,6 +16836,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16881,6 +16951,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16899,6 +16970,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16917,6 +16989,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -16934,6 +17007,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -17002,6 +17076,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -17034,6 +17109,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -17148,6 +17224,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -17166,6 +17243,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -17184,6 +17262,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -17201,6 +17280,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -17272,6 +17352,7 @@ define amdgpu_kernel void @flat_system_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -17306,6 +17387,7 @@ define amdgpu_kernel void @flat_system_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -17431,6 +17513,7 @@ define amdgpu_kernel void @flat_system_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -17451,6 +17534,7 @@ define amdgpu_kernel void @flat_system_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -17474,6 +17558,7 @@ define amdgpu_kernel void @flat_system_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -17496,6 +17581,7 @@ define amdgpu_kernel void @flat_system_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -17567,6 +17653,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -17601,6 +17688,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -17726,6 +17814,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -17746,6 +17835,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -17769,6 +17859,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -17791,6 +17882,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -17862,6 +17954,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -17896,6 +17989,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -18021,6 +18115,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -18041,6 +18136,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -18064,6 +18160,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -18086,6 +18183,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -18157,6 +18255,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -18191,6 +18290,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -18316,6 +18416,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -18336,6 +18437,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -18359,6 +18461,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -18381,6 +18484,7 @@ define amdgpu_kernel void @flat_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -18452,6 +18556,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -18486,6 +18591,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -18611,6 +18717,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -18631,6 +18738,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -18654,6 +18762,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -18676,6 +18785,7 @@ define amdgpu_kernel void @flat_system_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -18747,6 +18857,7 @@ define amdgpu_kernel void @flat_system_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -18781,6 +18892,7 @@ define amdgpu_kernel void @flat_system_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -18906,6 +19018,7 @@ define amdgpu_kernel void @flat_system_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -18926,6 +19039,7 @@ define amdgpu_kernel void @flat_system_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -18949,6 +19063,7 @@ define amdgpu_kernel void @flat_system_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -18971,6 +19086,7 @@ define amdgpu_kernel void @flat_system_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -19042,6 +19158,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -19076,6 +19193,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -19201,6 +19319,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -19221,6 +19340,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -19244,6 +19364,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -19266,6 +19387,7 @@ define amdgpu_kernel void @flat_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -19337,6 +19459,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -19371,6 +19494,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -19496,6 +19620,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -19516,6 +19641,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -19539,6 +19665,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -19561,6 +19688,7 @@ define amdgpu_kernel void @flat_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll
index a88e0e217fdb4..849b26c344196 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll
@@ -459,6 +459,7 @@ define amdgpu_kernel void @flat_nontemporal_store_0(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: flat_store_dword v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: s_endpgm
;
@@ -478,6 +479,7 @@ define amdgpu_kernel void @flat_nontemporal_store_0(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: s_endpgm
;
@@ -508,6 +510,7 @@ define amdgpu_kernel void @flat_nontemporal_store_0(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2 dlc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: s_endpgm
;
@@ -523,6 +526,7 @@ define amdgpu_kernel void @flat_nontemporal_store_0(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2 dlc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: s_endpgm
;
@@ -542,6 +546,7 @@ define amdgpu_kernel void @flat_nontemporal_store_0(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_endpgm
;
@@ -561,6 +566,7 @@ define amdgpu_kernel void @flat_nontemporal_store_0(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_endpgm
ptr %in, ptr %out) {
@@ -632,6 +638,7 @@ define amdgpu_kernel void @flat_nontemporal_store_1(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, v3
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: flat_store_dword v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: s_endpgm
;
@@ -664,6 +671,7 @@ define amdgpu_kernel void @flat_nontemporal_store_1(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, v3
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: s_endpgm
;
@@ -723,6 +731,7 @@ define amdgpu_kernel void @flat_nontemporal_store_1(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, v3
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2 dlc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: s_endpgm
;
@@ -753,6 +762,7 @@ define amdgpu_kernel void @flat_nontemporal_store_1(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, v3
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2 dlc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: s_endpgm
;
@@ -791,6 +801,7 @@ define amdgpu_kernel void @flat_nontemporal_store_1(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_endpgm
;
@@ -829,6 +840,7 @@ define amdgpu_kernel void @flat_nontemporal_store_1(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_endpgm
ptr %in, ptr %out) {
@@ -965,7 +977,7 @@ define amdgpu_kernel void @flat_volatile_workgroup_acquire_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -1087,7 +1099,7 @@ define amdgpu_kernel void @flat_volatile_workgroup_release_store(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr %out) {
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-wavefront.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-wavefront.ll
index 7c637a20ab47b..f458f5ae69fac 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-wavefront.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-wavefront.ll
@@ -388,6 +388,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -406,6 +407,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -424,6 +426,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -438,6 +441,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_load_dword v2, v[0:1]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -453,6 +457,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -467,6 +472,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -479,6 +485,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -491,6 +498,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -504,6 +512,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -518,6 +527,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -532,6 +542,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -546,6 +557,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -569,7 +581,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -587,7 +601,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -605,7 +621,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -619,7 +637,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_load_dword v2, v[0:1]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -634,7 +654,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -648,7 +670,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -660,7 +684,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -672,7 +698,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; GFX942-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -685,7 +713,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -699,7 +729,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -713,7 +745,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -727,7 +761,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -1050,6 +1086,7 @@ define amdgpu_kernel void @flat_wavefront_release_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1065,6 +1102,7 @@ define amdgpu_kernel void @flat_wavefront_release_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_store_dword v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1080,6 +1118,7 @@ define amdgpu_kernel void @flat_wavefront_release_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -1091,6 +1130,7 @@ define amdgpu_kernel void @flat_wavefront_release_store(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1103,6 +1143,7 @@ define amdgpu_kernel void @flat_wavefront_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1115,6 +1156,7 @@ define amdgpu_kernel void @flat_wavefront_release_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1125,6 +1167,7 @@ define amdgpu_kernel void @flat_wavefront_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1135,6 +1178,7 @@ define amdgpu_kernel void @flat_wavefront_release_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1146,6 +1190,7 @@ define amdgpu_kernel void @flat_wavefront_release_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1157,6 +1202,7 @@ define amdgpu_kernel void @flat_wavefront_release_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -1199,6 +1245,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1214,6 +1261,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_store_dword v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1229,6 +1277,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -1240,6 +1289,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_store(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1252,6 +1302,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1264,6 +1315,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1274,6 +1326,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1284,6 +1337,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1295,6 +1349,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1306,6 +1361,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -1498,6 +1554,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_acquire_atomicrmw:
@@ -1513,6 +1570,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_acquire_atomicrmw:
@@ -1528,6 +1586,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_acquire_atomicrmw:
@@ -1539,6 +1598,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_acquire_atomicrmw:
@@ -1551,6 +1611,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_acquire_atomicrmw:
@@ -1563,6 +1624,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_acquire_atomicrmw:
@@ -1573,6 +1635,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_acquire_atomicrmw:
@@ -1583,6 +1646,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_acquire_atomicrmw:
@@ -1594,6 +1658,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_acquire_atomicrmw:
@@ -1605,6 +1670,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_acquire_atomicrmw:
@@ -1616,6 +1682,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_acquire_atomicrmw:
@@ -1627,6 +1694,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -1646,6 +1714,7 @@ define amdgpu_kernel void @flat_wavefront_release_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1661,6 +1730,7 @@ define amdgpu_kernel void @flat_wavefront_release_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1676,6 +1746,7 @@ define amdgpu_kernel void @flat_wavefront_release_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -1687,6 +1758,7 @@ define amdgpu_kernel void @flat_wavefront_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1699,6 +1771,7 @@ define amdgpu_kernel void @flat_wavefront_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1711,6 +1784,7 @@ define amdgpu_kernel void @flat_wavefront_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1721,6 +1795,7 @@ define amdgpu_kernel void @flat_wavefront_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1731,6 +1806,7 @@ define amdgpu_kernel void @flat_wavefront_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1742,6 +1818,7 @@ define amdgpu_kernel void @flat_wavefront_release_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1753,6 +1830,7 @@ define amdgpu_kernel void @flat_wavefront_release_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -1795,7 +1873,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_acq_rel_atomicrmw:
@@ -1810,7 +1890,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_acq_rel_atomicrmw:
@@ -1825,7 +1907,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_acq_rel_atomicrmw:
@@ -1836,7 +1920,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_atomicrmw:
@@ -1848,7 +1934,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_acq_rel_atomicrmw:
@@ -1860,7 +1948,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_atomicrmw:
@@ -1870,7 +1960,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_acq_rel_atomicrmw:
@@ -1880,7 +1972,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_acq_rel_atomicrmw:
@@ -1891,7 +1985,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_acq_rel_atomicrmw:
@@ -1902,7 +1998,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_acq_rel_atomicrmw:
@@ -1914,6 +2012,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_acq_rel_atomicrmw:
@@ -1925,6 +2024,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -1944,7 +2044,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_seq_cst_atomicrmw:
@@ -1959,7 +2061,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_seq_cst_atomicrmw:
@@ -1974,7 +2078,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_seq_cst_atomicrmw:
@@ -1985,7 +2091,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_atomicrmw:
@@ -1997,7 +2105,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_seq_cst_atomicrmw:
@@ -2009,7 +2119,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_atomicrmw:
@@ -2019,7 +2131,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_seq_cst_atomicrmw:
@@ -2029,7 +2143,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_seq_cst_atomicrmw:
@@ -2040,7 +2156,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_seq_cst_atomicrmw:
@@ -2051,7 +2169,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_seq_cst_atomicrmw:
@@ -2063,6 +2183,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_seq_cst_atomicrmw:
@@ -2074,6 +2195,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -2094,6 +2216,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2113,6 +2236,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2132,6 +2256,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2147,6 +2272,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2163,6 +2289,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2178,6 +2305,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2191,6 +2319,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2204,6 +2333,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2218,6 +2348,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2233,6 +2364,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2248,6 +2380,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2263,6 +2396,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2287,7 +2421,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2306,7 +2442,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2325,7 +2463,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2340,7 +2480,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2356,7 +2498,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2371,7 +2515,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2384,7 +2530,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2397,7 +2545,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2411,7 +2561,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2426,7 +2578,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2442,6 +2596,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2457,6 +2612,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2481,7 +2637,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2500,7 +2658,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2519,7 +2679,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2534,7 +2696,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2550,7 +2714,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2565,7 +2731,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2578,7 +2746,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2591,7 +2761,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -2605,7 +2777,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2620,7 +2794,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -2636,6 +2812,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2651,6 +2828,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -2928,6 +3106,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
@@ -2957,6 +3136,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
@@ -2986,6 +3166,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
@@ -3011,6 +3192,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
@@ -3027,6 +3209,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
@@ -3043,6 +3226,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
@@ -3057,6 +3241,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
@@ -3071,6 +3256,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
@@ -3086,6 +3272,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
@@ -3101,6 +3288,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
@@ -3116,6 +3304,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
@@ -3131,6 +3320,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -3165,6 +3355,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX7-NEXT: s_endpgm
;
@@ -3194,6 +3385,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -3223,6 +3415,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX10-CU-NEXT: s_endpgm
;
@@ -3248,6 +3441,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -3264,6 +3458,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -3280,6 +3475,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3294,6 +3490,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -3308,6 +3505,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3323,6 +3521,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -3338,6 +3537,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -3403,7 +3603,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3432,7 +3634,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3461,7 +3665,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3486,7 +3692,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3502,7 +3710,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3518,7 +3728,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3532,7 +3744,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3546,7 +3760,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3561,7 +3777,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3576,7 +3794,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3592,6 +3812,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3607,6 +3828,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -3641,7 +3863,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3670,7 +3894,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3699,7 +3925,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3724,7 +3952,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3740,7 +3970,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3756,7 +3988,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3770,7 +4004,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3784,7 +4020,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3799,7 +4037,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3814,7 +4054,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3830,6 +4072,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3845,6 +4088,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -3880,6 +4124,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
@@ -3909,6 +4154,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
@@ -3938,6 +4184,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
@@ -3963,6 +4210,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
@@ -3979,6 +4227,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
@@ -3995,6 +4244,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
@@ -4009,6 +4259,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
@@ -4023,6 +4274,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
@@ -4038,6 +4290,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
@@ -4053,6 +4306,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
@@ -4068,6 +4322,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
@@ -4083,6 +4338,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4118,6 +4374,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
@@ -4147,6 +4404,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
@@ -4176,6 +4434,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
@@ -4201,6 +4460,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
@@ -4217,6 +4477,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
@@ -4233,6 +4494,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
@@ -4247,6 +4509,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
@@ -4261,6 +4524,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
@@ -4276,6 +4540,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
@@ -4291,6 +4556,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
@@ -4306,6 +4572,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
@@ -4321,6 +4588,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4355,7 +4623,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_release_acquire_cmpxchg:
@@ -4384,7 +4654,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_release_acquire_cmpxchg:
@@ -4413,7 +4685,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_release_acquire_cmpxchg:
@@ -4438,7 +4712,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_release_acquire_cmpxchg:
@@ -4454,7 +4730,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_release_acquire_cmpxchg:
@@ -4470,7 +4748,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_release_acquire_cmpxchg:
@@ -4484,7 +4764,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_release_acquire_cmpxchg:
@@ -4498,7 +4780,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_release_acquire_cmpxchg:
@@ -4513,7 +4797,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_release_acquire_cmpxchg:
@@ -4528,7 +4814,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_release_acquire_cmpxchg:
@@ -4544,6 +4832,7 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_release_acquire_cmpxchg:
@@ -4559,6 +4848,7 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4593,7 +4883,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
@@ -4622,7 +4914,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
@@ -4651,7 +4945,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
@@ -4676,7 +4972,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
@@ -4692,7 +4990,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
@@ -4708,7 +5008,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
@@ -4722,7 +5024,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
@@ -4736,7 +5040,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
@@ -4751,7 +5057,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
@@ -4766,7 +5074,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
@@ -4782,6 +5092,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
@@ -4797,6 +5108,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4831,7 +5143,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
@@ -4860,7 +5174,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
@@ -4889,7 +5205,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
@@ -4914,7 +5232,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
@@ -4930,7 +5250,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
@@ -4946,7 +5268,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
@@ -4960,7 +5284,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
@@ -4974,7 +5300,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
@@ -4989,7 +5317,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
@@ -5004,7 +5334,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
@@ -5020,6 +5352,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
@@ -5035,6 +5368,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -5069,7 +5403,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5098,7 +5434,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5127,7 +5465,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5152,7 +5492,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5168,7 +5510,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5184,7 +5528,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5198,7 +5544,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5212,7 +5560,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5227,7 +5577,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5242,7 +5594,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5258,6 +5612,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5273,6 +5628,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -5307,7 +5663,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
@@ -5336,7 +5694,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
@@ -5365,7 +5725,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
@@ -5390,7 +5752,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
@@ -5406,7 +5770,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
@@ -5422,7 +5788,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
@@ -5436,7 +5804,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
@@ -5450,7 +5820,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
@@ -5465,7 +5837,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
@@ -5480,7 +5854,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
@@ -5496,6 +5872,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
@@ -5511,6 +5888,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -5545,7 +5923,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
@@ -5574,7 +5954,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
@@ -5603,7 +5985,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
@@ -5628,7 +6012,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
@@ -5644,7 +6030,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
@@ -5660,7 +6048,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
@@ -5674,7 +6064,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
@@ -5688,7 +6080,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
@@ -5703,7 +6097,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
@@ -5718,7 +6114,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
@@ -5734,6 +6132,7 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
@@ -5749,6 +6148,7 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -5783,7 +6183,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5812,7 +6214,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5841,7 +6245,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5866,7 +6272,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5882,7 +6290,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5898,7 +6308,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5912,7 +6324,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5926,7 +6340,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5941,7 +6357,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5956,7 +6374,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5972,6 +6392,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5987,6 +6408,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -6021,7 +6443,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -6050,7 +6474,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -6079,7 +6505,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -6104,7 +6532,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -6120,7 +6550,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -6136,7 +6568,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -6150,7 +6584,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -6164,7 +6600,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -6179,7 +6617,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -6194,7 +6634,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -6210,6 +6652,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -6225,6 +6668,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -6544,6 +6988,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6577,6 +7022,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6610,6 +7056,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6639,6 +7086,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6659,6 +7107,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -6678,6 +7127,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -6695,6 +7145,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -6712,6 +7163,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -6730,6 +7182,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6749,6 +7202,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6768,6 +7222,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -6787,6 +7242,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -6827,6 +7283,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
@@ -6860,6 +7317,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
@@ -6893,6 +7351,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
@@ -6922,6 +7381,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
@@ -6942,6 +7402,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6961,6 +7422,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
@@ -6978,6 +7440,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -6995,6 +7458,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
@@ -7013,6 +7477,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
@@ -7032,6 +7497,7 @@ define amdgpu_kernel void @flat_wavefront_release_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
@@ -7111,7 +7577,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7144,7 +7612,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7177,7 +7647,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7206,7 +7678,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7226,7 +7700,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7245,7 +7721,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7262,7 +7740,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7279,7 +7759,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7297,7 +7779,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7316,7 +7800,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7336,6 +7822,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7355,6 +7842,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7395,7 +7883,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7428,7 +7918,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7461,7 +7953,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7490,7 +7984,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7510,7 +8006,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7529,7 +8027,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7546,7 +8046,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7563,7 +8065,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7581,7 +8085,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7600,7 +8106,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7620,6 +8128,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7639,6 +8148,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7680,6 +8190,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7713,6 +8224,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7746,6 +8258,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7775,6 +8288,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7795,6 +8309,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7814,6 +8329,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7831,6 +8347,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7848,6 +8365,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -7866,6 +8384,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7885,6 +8404,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7904,6 +8424,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7923,6 +8444,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -7964,6 +8486,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -7997,6 +8520,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8030,6 +8554,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8059,6 +8584,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8079,6 +8605,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8098,6 +8625,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8115,6 +8643,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8132,6 +8661,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8150,6 +8680,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8169,6 +8700,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8188,6 +8720,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8207,6 +8740,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8247,7 +8781,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8280,7 +8816,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8313,7 +8851,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8342,7 +8882,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8362,7 +8904,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8381,7 +8925,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8398,7 +8944,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8415,7 +8963,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8433,7 +8983,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8452,7 +9004,9 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8472,6 +9026,7 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8491,6 +9046,7 @@ define amdgpu_kernel void @flat_wavefront_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8531,7 +9087,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8564,7 +9122,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8597,7 +9157,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8626,7 +9188,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8646,7 +9210,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8665,7 +9231,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8682,7 +9250,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8699,7 +9269,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8717,7 +9289,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8736,7 +9310,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8756,6 +9332,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8775,6 +9352,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -8815,7 +9393,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8848,7 +9428,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8881,7 +9463,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8910,7 +9494,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -8930,7 +9516,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8949,7 +9537,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8966,7 +9556,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -8983,7 +9575,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9001,7 +9595,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9020,7 +9616,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9040,6 +9638,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9059,6 +9658,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9099,7 +9699,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9132,7 +9734,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9165,7 +9769,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9194,7 +9800,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9214,7 +9822,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9233,7 +9843,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9250,7 +9862,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9267,7 +9881,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9285,7 +9901,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9304,7 +9922,9 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9324,6 +9944,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9343,6 +9964,7 @@ define amdgpu_kernel void @flat_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9383,7 +10005,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9416,7 +10040,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9449,7 +10075,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9478,7 +10106,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9498,7 +10128,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9517,7 +10149,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9534,7 +10168,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9551,7 +10187,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9569,7 +10207,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9588,7 +10228,9 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9608,6 +10250,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9627,6 +10270,7 @@ define amdgpu_kernel void @flat_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9667,7 +10311,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9700,7 +10346,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9733,7 +10381,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9762,7 +10412,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9782,7 +10434,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9801,7 +10455,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9818,7 +10474,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9835,7 +10493,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -9853,7 +10513,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9872,7 +10534,9 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9892,6 +10556,7 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9911,6 +10576,7 @@ define amdgpu_kernel void @flat_wavefront_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -9951,7 +10617,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -9984,7 +10652,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10017,7 +10687,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10046,7 +10718,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10066,7 +10740,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10085,7 +10761,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10102,7 +10780,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10119,7 +10799,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10137,7 +10819,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10156,7 +10840,9 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10176,6 +10862,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -10195,6 +10882,7 @@ define amdgpu_kernel void @flat_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -10235,7 +10923,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10268,7 +10958,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10301,7 +10993,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10330,7 +11024,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10350,7 +11046,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10369,7 +11067,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10386,7 +11086,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10403,7 +11105,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10421,7 +11125,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10440,7 +11146,9 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10460,6 +11168,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -10479,6 +11188,7 @@ define amdgpu_kernel void @flat_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -10869,6 +11579,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10887,6 +11598,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10905,6 +11617,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10919,6 +11632,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_load_dword v2, v[0:1]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10934,6 +11648,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10948,6 +11663,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10960,6 +11676,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10972,6 +11689,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10985,6 +11703,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10999,6 +11718,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11013,6 +11733,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -11027,6 +11748,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -11050,7 +11772,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11068,7 +11792,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11086,7 +11812,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11100,7 +11828,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_load_dword v2, v[0:1]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11115,7 +11845,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -11129,7 +11861,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -11141,7 +11875,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -11153,7 +11889,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; GFX942-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -11166,7 +11904,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11180,7 +11920,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11194,7 +11936,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -11208,7 +11952,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -11531,6 +12277,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -11546,6 +12293,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_store_dword v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -11561,6 +12309,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -11572,6 +12321,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_store(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11584,6 +12334,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11596,6 +12347,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11606,6 +12358,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11616,6 +12369,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -11627,6 +12381,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -11638,6 +12393,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -11680,6 +12436,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -11695,6 +12452,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_store_dword v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -11710,6 +12468,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -11721,6 +12480,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_store(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11733,6 +12493,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11745,6 +12506,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11755,6 +12517,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11765,6 +12528,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -11776,6 +12540,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -11787,6 +12552,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -11979,6 +12745,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
@@ -11994,6 +12761,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
@@ -12009,6 +12777,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
@@ -12020,6 +12789,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
@@ -12032,6 +12802,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
@@ -12044,6 +12815,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
@@ -12054,6 +12826,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
@@ -12064,6 +12837,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
@@ -12075,6 +12849,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
@@ -12086,6 +12861,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
@@ -12097,6 +12873,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
@@ -12108,6 +12885,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -12127,6 +12905,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -12142,6 +12921,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
; GFX10-WGP-NEXT: s_endpgm
;
@@ -12157,6 +12937,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -12168,6 +12949,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -12180,6 +12962,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -12192,6 +12975,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -12202,6 +12986,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -12212,6 +12997,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -12223,6 +13009,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
; GFX11-WGP-NEXT: s_endpgm
;
@@ -12234,6 +13021,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -12276,7 +13064,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
@@ -12291,7 +13081,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
@@ -12306,7 +13098,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
@@ -12317,7 +13111,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
@@ -12329,7 +13125,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
@@ -12341,7 +13139,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
@@ -12351,7 +13151,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
@@ -12361,7 +13163,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
@@ -12372,7 +13176,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
@@ -12383,7 +13189,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
@@ -12395,6 +13203,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
@@ -12406,6 +13215,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -12425,7 +13235,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
@@ -12440,7 +13252,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
@@ -12455,7 +13269,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
@@ -12466,7 +13282,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
@@ -12478,7 +13296,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
@@ -12490,7 +13310,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
@@ -12500,7 +13322,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
@@ -12510,7 +13334,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
@@ -12521,7 +13347,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
@@ -12532,7 +13360,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
@@ -12544,6 +13374,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
@@ -12555,6 +13386,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -12575,6 +13407,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12594,6 +13427,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12613,6 +13447,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12628,6 +13463,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12644,6 +13480,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12659,6 +13496,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12672,6 +13510,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12685,6 +13524,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12699,6 +13539,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12714,6 +13555,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12729,6 +13571,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -12744,6 +13587,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -12768,7 +13612,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12787,7 +13633,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12806,7 +13654,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12821,7 +13671,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12837,7 +13689,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12852,7 +13706,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12865,7 +13721,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12878,7 +13736,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12892,7 +13752,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12907,7 +13769,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12923,6 +13787,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -12938,6 +13803,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -12962,7 +13828,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12981,7 +13849,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13000,7 +13870,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13015,7 +13887,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13031,7 +13905,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -13046,7 +13922,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -13059,7 +13937,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -13072,7 +13952,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -13086,7 +13968,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13101,7 +13985,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13117,6 +14003,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s2
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -13132,6 +14019,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -13409,6 +14297,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -13438,6 +14327,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -13467,6 +14357,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -13492,6 +14383,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -13508,6 +14400,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -13524,6 +14417,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -13538,6 +14432,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -13552,6 +14447,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -13567,6 +14463,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -13582,6 +14479,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -13597,6 +14495,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -13612,6 +14511,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -13646,6 +14546,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX7-NEXT: s_endpgm
;
@@ -13675,6 +14576,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -13704,6 +14606,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX10-CU-NEXT: s_endpgm
;
@@ -13729,6 +14632,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -13745,6 +14649,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -13761,6 +14666,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -13775,6 +14681,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -13789,6 +14696,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -13804,6 +14712,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -13819,6 +14728,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -13884,7 +14794,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13913,7 +14825,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13942,7 +14856,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13967,7 +14883,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13983,7 +14901,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13999,7 +14919,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -14013,7 +14935,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -14027,7 +14951,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -14042,7 +14968,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -14057,7 +14985,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -14073,6 +15003,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -14088,6 +15019,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -14122,7 +15054,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -14151,7 +15085,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -14180,7 +15116,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -14205,7 +15143,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -14221,7 +15161,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -14237,7 +15179,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -14251,7 +15195,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -14265,7 +15211,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -14280,7 +15228,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -14295,7 +15245,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -14311,6 +15263,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -14326,6 +15279,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -14361,6 +15315,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -14390,6 +15345,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -14419,6 +15375,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -14444,6 +15401,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -14460,6 +15418,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -14476,6 +15435,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -14490,6 +15450,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -14504,6 +15465,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -14519,6 +15481,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -14534,6 +15497,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -14549,6 +15513,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -14564,6 +15529,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -14599,6 +15565,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -14628,6 +15595,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -14657,6 +15625,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -14682,6 +15651,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -14698,6 +15668,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -14714,6 +15685,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -14728,6 +15700,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -14742,6 +15715,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -14757,6 +15731,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -14772,6 +15747,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -14787,6 +15763,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -14802,6 +15779,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -14836,7 +15814,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
@@ -14865,7 +15845,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
@@ -14894,7 +15876,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
@@ -14919,7 +15903,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
@@ -14935,7 +15921,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
@@ -14951,7 +15939,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
@@ -14965,7 +15955,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
@@ -14979,7 +15971,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
@@ -14994,7 +15988,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
@@ -15009,7 +16005,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
@@ -15025,6 +16023,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
@@ -15040,6 +16039,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15074,7 +16074,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -15103,7 +16105,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -15132,7 +16136,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -15157,7 +16163,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -15173,7 +16181,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -15189,7 +16199,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -15203,7 +16215,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -15217,7 +16231,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -15232,7 +16248,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -15247,7 +16265,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -15263,6 +16283,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -15278,6 +16299,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15312,7 +16334,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -15341,7 +16365,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -15370,7 +16396,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -15395,7 +16423,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -15411,7 +16441,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -15427,7 +16459,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -15441,7 +16475,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -15455,7 +16491,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -15470,7 +16508,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -15485,7 +16525,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -15501,6 +16543,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -15516,6 +16559,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15550,7 +16594,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -15579,7 +16625,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -15608,7 +16656,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -15633,7 +16683,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -15649,7 +16701,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -15665,7 +16719,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -15679,7 +16735,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -15693,7 +16751,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -15708,7 +16768,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -15723,7 +16785,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -15739,6 +16803,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -15754,6 +16819,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15788,7 +16854,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15817,7 +16885,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15846,7 +16916,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15871,7 +16943,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15887,7 +16961,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15903,7 +16979,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15917,7 +16995,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15931,7 +17011,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15946,7 +17028,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15961,7 +17045,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15977,6 +17063,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15992,6 +17079,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -16026,7 +17114,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -16055,7 +17145,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -16084,7 +17176,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -16109,7 +17203,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -16125,7 +17221,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -16141,7 +17239,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -16155,7 +17255,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -16169,7 +17271,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -16184,7 +17288,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -16199,7 +17305,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -16215,6 +17323,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -16230,6 +17339,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -16264,7 +17374,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16293,7 +17405,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16322,7 +17436,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16347,7 +17463,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16363,7 +17481,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16379,7 +17499,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16393,7 +17515,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16407,7 +17531,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16422,7 +17548,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16437,7 +17565,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16453,6 +17583,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16468,6 +17599,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -16502,7 +17634,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16531,7 +17665,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16560,7 +17696,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16585,7 +17723,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16601,7 +17741,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16617,7 +17759,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16631,7 +17775,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16645,7 +17791,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16660,7 +17808,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16675,7 +17825,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16691,6 +17843,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16706,6 +17859,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -17025,6 +18179,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17058,6 +18213,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17091,6 +18247,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17120,6 +18277,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17140,6 +18298,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17159,6 +18318,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17176,6 +18336,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17193,6 +18354,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17211,6 +18373,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17230,6 +18393,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17249,6 +18413,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -17268,6 +18433,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -17308,7 +18474,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17341,7 +18509,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17374,7 +18544,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17403,7 +18575,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17423,7 +18597,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17442,7 +18618,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17459,7 +18637,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17476,7 +18656,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17494,7 +18676,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17513,7 +18697,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17533,6 +18719,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -17552,6 +18739,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -17592,7 +18780,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17625,7 +18815,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17658,7 +18850,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17687,7 +18881,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17707,7 +18903,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17726,7 +18924,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17743,7 +18943,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17760,7 +18962,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17778,7 +18982,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17797,7 +19003,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17817,6 +19025,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -17836,6 +19045,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -17877,6 +19087,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17910,6 +19121,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17943,6 +19155,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17972,6 +19185,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17992,6 +19206,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18011,6 +19226,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18028,6 +19244,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18045,6 +19262,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18063,6 +19281,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18082,6 +19301,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18101,6 +19321,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18120,6 +19341,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18161,6 +19383,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18194,6 +19417,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18227,6 +19451,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18256,6 +19481,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18276,6 +19502,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18295,6 +19522,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18312,6 +19540,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18329,6 +19558,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18347,6 +19577,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18366,6 +19597,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18385,6 +19617,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18404,6 +19637,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18444,7 +19678,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18477,7 +19713,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18510,7 +19748,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18539,7 +19779,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18559,7 +19801,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18578,7 +19822,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18595,7 +19841,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18612,7 +19860,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18630,7 +19880,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18649,7 +19901,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18669,6 +19923,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18688,6 +19943,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18728,7 +19984,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18761,7 +20019,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18794,7 +20054,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18823,7 +20085,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18843,7 +20107,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18862,7 +20128,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18879,7 +20147,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18896,7 +20166,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18914,7 +20186,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18933,7 +20207,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18953,6 +20229,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18972,6 +20249,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19012,7 +20290,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19045,7 +20325,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19078,7 +20360,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19107,7 +20391,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19127,7 +20413,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19146,7 +20434,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19163,7 +20453,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19180,7 +20472,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19198,7 +20492,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19217,7 +20513,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19237,6 +20535,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19256,6 +20555,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19296,7 +20596,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19329,7 +20631,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19362,7 +20666,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19391,7 +20697,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19411,7 +20719,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19430,7 +20740,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19447,7 +20759,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19464,7 +20778,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19482,7 +20798,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19501,7 +20819,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19521,6 +20841,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19540,6 +20861,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19580,7 +20902,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19613,7 +20937,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19646,7 +20972,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19675,7 +21003,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19695,7 +21025,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19714,7 +21046,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19731,7 +21065,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19748,7 +21084,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19766,7 +21104,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19785,7 +21125,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19805,6 +21147,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19824,6 +21167,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19864,7 +21208,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19897,7 +21243,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19930,7 +21278,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19959,7 +21309,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19979,7 +21331,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19998,7 +21352,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20015,7 +21371,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20032,7 +21390,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20050,7 +21410,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20069,7 +21431,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20089,6 +21453,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20108,6 +21473,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20148,7 +21514,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20181,7 +21549,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20214,7 +21584,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20243,7 +21615,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20263,7 +21637,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20282,7 +21658,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20299,7 +21677,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20316,7 +21696,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20334,7 +21716,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20353,7 +21737,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20373,6 +21759,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20392,6 +21779,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20432,7 +21820,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20465,7 +21855,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20498,7 +21890,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20527,7 +21921,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20547,7 +21943,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20566,7 +21964,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20583,7 +21983,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20600,7 +22002,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20618,7 +22022,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, v0
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20637,7 +22043,9 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20657,6 +22065,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20676,6 +22085,7 @@ define amdgpu_kernel void @flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll
index 0fd4aa4a7a93f..be7ca1f19ad3f 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll
@@ -557,7 +557,7 @@ define amdgpu_kernel void @flat_workgroup_acquire_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -768,9 +768,9 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -1238,7 +1238,7 @@ define amdgpu_kernel void @flat_workgroup_release_store(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr %out) {
@@ -1404,7 +1404,7 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_store(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr %out) {
@@ -1709,7 +1709,8 @@ define amdgpu_kernel void @flat_workgroup_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -1722,7 +1723,7 @@ define amdgpu_kernel void @flat_workgroup_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -1887,7 +1888,7 @@ define amdgpu_kernel void @flat_workgroup_release_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
@@ -2059,7 +2060,8 @@ define amdgpu_kernel void @flat_workgroup_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -2071,9 +2073,9 @@ define amdgpu_kernel void @flat_workgroup_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -2244,7 +2246,8 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -2256,9 +2259,9 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -2459,7 +2462,7 @@ define amdgpu_kernel void @flat_workgroup_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -2682,9 +2685,9 @@ define amdgpu_kernel void @flat_workgroup_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -2907,9 +2910,9 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -3391,7 +3394,8 @@ define amdgpu_kernel void @flat_workgroup_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3408,7 +3412,7 @@ define amdgpu_kernel void @flat_workgroup_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -3662,7 +3666,7 @@ define amdgpu_kernel void @flat_workgroup_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -3919,7 +3923,8 @@ define amdgpu_kernel void @flat_workgroup_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3935,9 +3940,9 @@ define amdgpu_kernel void @flat_workgroup_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4193,7 +4198,8 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4209,9 +4215,9 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4451,7 +4457,8 @@ define amdgpu_kernel void @flat_workgroup_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4468,7 +4475,7 @@ define amdgpu_kernel void @flat_workgroup_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4708,7 +4715,8 @@ define amdgpu_kernel void @flat_workgroup_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4725,7 +4733,7 @@ define amdgpu_kernel void @flat_workgroup_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -4981,7 +4989,8 @@ define amdgpu_kernel void @flat_workgroup_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4997,9 +5006,9 @@ define amdgpu_kernel void @flat_workgroup_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -5255,7 +5264,8 @@ define amdgpu_kernel void @flat_workgroup_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5271,9 +5281,9 @@ define amdgpu_kernel void @flat_workgroup_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -5529,7 +5539,8 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5545,9 +5556,9 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -5803,7 +5814,8 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
-; GFX12-WGP-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_storecnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5819,9 +5831,9 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -6395,7 +6407,7 @@ define amdgpu_kernel void @flat_workgroup_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -6695,7 +6707,7 @@ define amdgpu_kernel void @flat_workgroup_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
@@ -7009,9 +7021,9 @@ define amdgpu_kernel void @flat_workgroup_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -7324,9 +7336,9 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -7624,7 +7636,7 @@ define amdgpu_kernel void @flat_workgroup_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -7920,7 +7932,7 @@ define amdgpu_kernel void @flat_workgroup_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -8233,9 +8245,9 @@ define amdgpu_kernel void @flat_workgroup_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -8548,9 +8560,9 @@ define amdgpu_kernel void @flat_workgroup_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -8863,9 +8875,9 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -9178,9 +9190,9 @@ define amdgpu_kernel void @flat_workgroup_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -9491,9 +9503,9 @@ define amdgpu_kernel void @flat_workgroup_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -9806,9 +9818,9 @@ define amdgpu_kernel void @flat_workgroup_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -10121,9 +10133,9 @@ define amdgpu_kernel void @flat_workgroup_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -10436,9 +10448,9 @@ define amdgpu_kernel void @flat_workgroup_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -10829,6 +10841,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10867,6 +10880,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_load(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10881,6 +10895,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_load(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_load_dword v2, v[0:1]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10896,6 +10911,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10923,6 +10939,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -10965,6 +10982,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_load(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -10995,6 +11013,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -11018,7 +11037,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11058,7 +11079,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_load_dword v2, v[0:1]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11072,7 +11095,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_load_dword v2, v[0:1]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11087,7 +11112,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -11115,7 +11142,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -11160,7 +11189,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -11196,7 +11227,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_load_b32 v2, v[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -11519,6 +11552,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -11551,6 +11585,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -11562,6 +11597,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_store(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11574,6 +11610,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11597,6 +11634,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11632,6 +11670,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -11658,6 +11697,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_store(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr %out) {
@@ -11678,6 +11718,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -11710,6 +11751,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_store_dword v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -11721,6 +11763,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_store(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11733,6 +11776,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11756,6 +11800,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11791,6 +11836,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -11817,6 +11863,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_store(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_store_b32 v[0:1], v2
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr %out) {
@@ -11987,6 +12034,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_acquire_atomicrmw:
@@ -12002,6 +12050,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s7
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -12019,6 +12068,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_acquire_atomicrmw:
@@ -12030,6 +12080,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_atomicrmw:
@@ -12042,6 +12093,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_atomicrmw:
@@ -12066,6 +12118,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_atomicrmw:
@@ -12089,6 +12142,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -12102,6 +12156,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_acquire_atomicrmw:
@@ -12113,6 +12168,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s3
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -12126,6 +12182,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -12145,6 +12202,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -12177,6 +12235,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
; GFX10-CU-NEXT: s_endpgm
;
@@ -12188,6 +12247,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -12200,6 +12260,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -12223,6 +12284,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -12258,6 +12320,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
; GFX11-CU-NEXT: s_endpgm
;
@@ -12284,6 +12347,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
@@ -12304,7 +12368,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_acq_rel_atomicrmw:
@@ -12322,6 +12388,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -12338,7 +12405,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_acq_rel_atomicrmw:
@@ -12349,7 +12418,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_atomicrmw:
@@ -12361,7 +12432,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_atomicrmw:
@@ -12386,7 +12459,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_atomicrmw:
@@ -12413,6 +12488,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -12425,7 +12501,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_acq_rel_atomicrmw:
@@ -12441,6 +12519,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -12453,7 +12532,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -12473,7 +12554,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_seq_cst_atomicrmw:
@@ -12491,6 +12574,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -12507,7 +12591,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v[0:1], v2
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_seq_cst_atomicrmw:
@@ -12518,7 +12604,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v[0:1], v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_atomicrmw:
@@ -12530,7 +12618,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_atomicrmw:
@@ -12555,7 +12645,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_atomicrmw:
@@ -12582,6 +12674,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -12594,7 +12687,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_seq_cst_atomicrmw:
@@ -12610,6 +12705,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -12622,7 +12718,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s3
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in) {
entry:
@@ -12643,6 +12741,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12683,6 +12782,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12698,6 +12798,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12714,6 +12815,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12743,6 +12845,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12788,6 +12891,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12820,6 +12924,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -12844,7 +12949,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12886,7 +12993,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12901,7 +13010,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -12917,7 +13028,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12947,7 +13060,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -12995,7 +13110,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13033,7 +13150,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -13058,7 +13177,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13100,7 +13221,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13115,7 +13238,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13131,7 +13256,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -13161,7 +13288,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -13209,7 +13338,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -13247,7 +13378,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_swap_b32 v2, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -13525,6 +13658,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13554,6 +13688,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -13585,6 +13720,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13610,6 +13746,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13626,6 +13763,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13658,6 +13796,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13689,6 +13828,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -13706,6 +13846,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13721,6 +13862,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -13738,6 +13880,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -13772,6 +13915,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX7-NEXT: s_endpgm
;
@@ -13832,6 +13976,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX10-CU-NEXT: s_endpgm
;
@@ -13857,6 +14002,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -13873,6 +14019,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -13904,6 +14051,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -13951,6 +14099,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -13985,6 +14134,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
@@ -14020,7 +14170,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14052,6 +14204,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -14082,7 +14235,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14107,7 +14262,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14123,7 +14280,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14156,7 +14315,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14191,6 +14352,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -14207,7 +14369,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14227,6 +14391,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -14243,7 +14408,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -14278,7 +14445,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14310,6 +14479,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -14340,7 +14510,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14365,7 +14537,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14381,7 +14555,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14414,7 +14590,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14449,6 +14627,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -14465,7 +14644,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14485,6 +14666,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -14501,7 +14683,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -14537,6 +14721,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14566,6 +14751,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -14597,6 +14783,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14622,6 +14809,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14638,6 +14826,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14670,6 +14859,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14701,6 +14891,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -14718,6 +14909,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14733,6 +14925,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -14750,6 +14943,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -14785,6 +14979,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14814,6 +15009,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -14845,6 +15041,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14870,6 +15067,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14886,6 +15084,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14918,6 +15117,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14949,6 +15149,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -14966,6 +15167,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14981,6 +15183,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -14998,6 +15201,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15032,7 +15236,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_release_acquire_cmpxchg:
@@ -15064,6 +15270,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -15094,7 +15301,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_release_acquire_cmpxchg:
@@ -15119,7 +15328,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_release_acquire_cmpxchg:
@@ -15135,7 +15346,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_release_acquire_cmpxchg:
@@ -15168,7 +15381,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_release_acquire_cmpxchg:
@@ -15203,6 +15418,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -15219,7 +15435,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_release_acquire_cmpxchg:
@@ -15239,6 +15457,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -15255,7 +15474,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15290,7 +15511,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15322,6 +15545,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -15352,7 +15576,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15377,7 +15603,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15393,7 +15621,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15426,7 +15656,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15461,6 +15693,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -15477,7 +15710,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15497,6 +15732,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -15513,7 +15749,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15548,7 +15786,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15580,6 +15820,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -15610,7 +15851,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15635,7 +15878,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15651,7 +15896,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15684,7 +15931,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15719,6 +15968,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -15735,7 +15985,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15755,6 +16007,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -15771,7 +16024,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -15806,7 +16061,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15838,6 +16095,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -15868,7 +16126,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15893,7 +16153,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15909,7 +16171,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15942,7 +16206,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15977,6 +16243,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -15993,7 +16260,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -16013,6 +16282,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -16029,7 +16299,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -16064,7 +16336,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16096,6 +16370,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -16126,7 +16401,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16151,7 +16428,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16167,7 +16446,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16200,7 +16481,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16235,6 +16518,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -16251,7 +16535,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16271,6 +16557,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -16287,7 +16574,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -16322,7 +16611,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16354,6 +16645,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -16384,7 +16676,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16409,7 +16703,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16425,7 +16721,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16458,7 +16756,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16493,6 +16793,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -16509,7 +16810,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16529,6 +16832,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -16545,7 +16849,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -16580,7 +16886,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16612,6 +16920,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -16642,7 +16951,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16667,7 +16978,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16683,7 +16996,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16716,7 +17031,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16751,6 +17068,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -16767,7 +17085,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16787,6 +17107,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -16803,7 +17124,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -16838,7 +17161,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16870,6 +17195,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -16900,7 +17226,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16925,7 +17253,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16941,7 +17271,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16974,7 +17306,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -17009,6 +17343,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -17025,7 +17360,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -17045,6 +17382,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -17061,7 +17399,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr %out, i32 %in, i32 %old) {
entry:
@@ -17381,6 +17721,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17449,6 +17790,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17478,6 +17820,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17498,6 +17841,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17535,6 +17879,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -17592,6 +17937,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17632,6 +17978,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -17672,6 +18019,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
@@ -17740,6 +18088,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
@@ -17769,6 +18118,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
@@ -17789,6 +18139,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17826,6 +18177,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -17883,6 +18235,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
@@ -17925,6 +18278,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
@@ -17966,7 +18320,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18036,7 +18392,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18065,7 +18423,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18085,7 +18445,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18123,7 +18485,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18183,7 +18547,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18229,7 +18595,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18270,7 +18638,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18340,7 +18710,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18369,7 +18741,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18389,7 +18763,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18427,7 +18803,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18487,7 +18865,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18533,7 +18913,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18575,6 +18957,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18643,6 +19026,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18672,6 +19056,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18692,6 +19077,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18729,6 +19115,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -18786,6 +19173,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18828,6 +19216,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -18869,6 +19258,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18937,6 +19327,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18966,6 +19357,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -18986,6 +19378,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19023,6 +19416,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19080,6 +19474,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19120,6 +19515,7 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19160,7 +19556,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19230,7 +19628,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19259,7 +19659,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19279,7 +19681,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19317,7 +19721,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19377,7 +19783,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19423,7 +19831,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19464,7 +19874,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19534,7 +19946,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19563,7 +19977,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19583,7 +19999,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19621,7 +20039,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19681,7 +20101,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19727,7 +20149,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -19768,7 +20192,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19838,7 +20264,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19867,7 +20295,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -19887,7 +20317,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19925,7 +20357,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -19985,7 +20419,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20031,7 +20467,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20072,7 +20510,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20142,7 +20582,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20171,7 +20613,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20191,7 +20635,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20229,7 +20675,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20289,7 +20737,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20335,7 +20785,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20376,7 +20828,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20446,7 +20900,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20475,7 +20931,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20495,7 +20953,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20533,7 +20993,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20593,7 +21055,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20637,7 +21101,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20678,7 +21144,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20748,7 +21216,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20777,7 +21247,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20797,7 +21269,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20835,7 +21309,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -20895,7 +21371,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -20941,7 +21419,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -20982,7 +21462,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -21052,7 +21534,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -21081,7 +21565,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -21101,7 +21587,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -21139,7 +21627,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -21199,7 +21689,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -21245,7 +21737,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
@@ -21286,7 +21780,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -21356,7 +21852,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s7
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -21385,7 +21883,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v3, v0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -21405,7 +21905,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 glc
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -21443,7 +21945,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
@@ -21503,7 +22007,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 glc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -21549,7 +22055,9 @@ define amdgpu_kernel void @flat_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, v0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: flat_atomic_cmpswap_b32 v2, v[0:1], v[2:3] offset:16 th:TH_ATOMIC_RETURN
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll
index 74a72e04fa4ae..09730c389d76e 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll
@@ -1672,6 +1672,7 @@ define amdgpu_kernel void @global_agent_acquire_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -1685,6 +1686,7 @@ define amdgpu_kernel void @global_agent_acquire_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -1764,6 +1766,7 @@ define amdgpu_kernel void @global_agent_acquire_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -1777,6 +1780,7 @@ define amdgpu_kernel void @global_agent_acquire_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -1790,6 +1794,7 @@ define amdgpu_kernel void @global_agent_acquire_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -1802,6 +1807,7 @@ define amdgpu_kernel void @global_agent_acquire_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -2038,6 +2044,7 @@ define amdgpu_kernel void @global_agent_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -2053,6 +2060,7 @@ define amdgpu_kernel void @global_agent_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -2141,6 +2149,7 @@ define amdgpu_kernel void @global_agent_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -2156,6 +2165,7 @@ define amdgpu_kernel void @global_agent_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -2173,6 +2183,7 @@ define amdgpu_kernel void @global_agent_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -2189,6 +2200,7 @@ define amdgpu_kernel void @global_agent_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -2246,6 +2258,7 @@ define amdgpu_kernel void @global_agent_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -2261,6 +2274,7 @@ define amdgpu_kernel void @global_agent_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -2349,6 +2363,7 @@ define amdgpu_kernel void @global_agent_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -2364,6 +2379,7 @@ define amdgpu_kernel void @global_agent_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -2381,6 +2397,7 @@ define amdgpu_kernel void @global_agent_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -2397,6 +2414,7 @@ define amdgpu_kernel void @global_agent_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -3347,6 +3365,7 @@ define amdgpu_kernel void @global_agent_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -3364,6 +3383,7 @@ define amdgpu_kernel void @global_agent_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -3468,6 +3488,7 @@ define amdgpu_kernel void @global_agent_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -3485,6 +3506,7 @@ define amdgpu_kernel void @global_agent_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -3502,6 +3524,7 @@ define amdgpu_kernel void @global_agent_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -3518,6 +3541,7 @@ define amdgpu_kernel void @global_agent_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -3843,6 +3867,7 @@ define amdgpu_kernel void @global_agent_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -3862,6 +3887,7 @@ define amdgpu_kernel void @global_agent_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -3975,6 +4001,7 @@ define amdgpu_kernel void @global_agent_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -3994,6 +4021,7 @@ define amdgpu_kernel void @global_agent_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -4015,6 +4043,7 @@ define amdgpu_kernel void @global_agent_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -4035,6 +4064,7 @@ define amdgpu_kernel void @global_agent_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -4116,6 +4146,7 @@ define amdgpu_kernel void @global_agent_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -4135,6 +4166,7 @@ define amdgpu_kernel void @global_agent_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -4248,6 +4280,7 @@ define amdgpu_kernel void @global_agent_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -4267,6 +4300,7 @@ define amdgpu_kernel void @global_agent_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -4288,6 +4322,7 @@ define amdgpu_kernel void @global_agent_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -4308,6 +4343,7 @@ define amdgpu_kernel void @global_agent_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -4385,6 +4421,7 @@ define amdgpu_kernel void @global_agent_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -4402,6 +4439,7 @@ define amdgpu_kernel void @global_agent_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -4506,6 +4544,7 @@ define amdgpu_kernel void @global_agent_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -4523,6 +4562,7 @@ define amdgpu_kernel void @global_agent_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -4540,6 +4580,7 @@ define amdgpu_kernel void @global_agent_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -4556,6 +4597,7 @@ define amdgpu_kernel void @global_agent_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -4633,6 +4675,7 @@ define amdgpu_kernel void @global_agent_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -4650,6 +4693,7 @@ define amdgpu_kernel void @global_agent_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -4754,6 +4798,7 @@ define amdgpu_kernel void @global_agent_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -4771,6 +4816,7 @@ define amdgpu_kernel void @global_agent_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -4788,6 +4834,7 @@ define amdgpu_kernel void @global_agent_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -4804,6 +4851,7 @@ define amdgpu_kernel void @global_agent_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -4885,6 +4933,7 @@ define amdgpu_kernel void @global_agent_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -4904,6 +4953,7 @@ define amdgpu_kernel void @global_agent_release_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -5017,6 +5067,7 @@ define amdgpu_kernel void @global_agent_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -5036,6 +5087,7 @@ define amdgpu_kernel void @global_agent_release_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -5057,6 +5109,7 @@ define amdgpu_kernel void @global_agent_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -5077,6 +5130,7 @@ define amdgpu_kernel void @global_agent_release_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -5158,6 +5212,7 @@ define amdgpu_kernel void @global_agent_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -5177,6 +5232,7 @@ define amdgpu_kernel void @global_agent_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -5290,6 +5346,7 @@ define amdgpu_kernel void @global_agent_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -5309,6 +5366,7 @@ define amdgpu_kernel void @global_agent_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -5330,6 +5388,7 @@ define amdgpu_kernel void @global_agent_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -5350,6 +5409,7 @@ define amdgpu_kernel void @global_agent_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -5431,6 +5491,7 @@ define amdgpu_kernel void @global_agent_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -5450,6 +5511,7 @@ define amdgpu_kernel void @global_agent_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -5563,6 +5625,7 @@ define amdgpu_kernel void @global_agent_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -5582,6 +5645,7 @@ define amdgpu_kernel void @global_agent_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -5603,6 +5667,7 @@ define amdgpu_kernel void @global_agent_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -5623,6 +5688,7 @@ define amdgpu_kernel void @global_agent_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -5704,6 +5770,7 @@ define amdgpu_kernel void @global_agent_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -5723,6 +5790,7 @@ define amdgpu_kernel void @global_agent_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -5836,6 +5904,7 @@ define amdgpu_kernel void @global_agent_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -5855,6 +5924,7 @@ define amdgpu_kernel void @global_agent_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -5876,6 +5946,7 @@ define amdgpu_kernel void @global_agent_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -5896,6 +5967,7 @@ define amdgpu_kernel void @global_agent_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -5977,6 +6049,7 @@ define amdgpu_kernel void @global_agent_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -5996,6 +6069,7 @@ define amdgpu_kernel void @global_agent_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -6109,6 +6183,7 @@ define amdgpu_kernel void @global_agent_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -6128,6 +6203,7 @@ define amdgpu_kernel void @global_agent_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -6149,6 +6225,7 @@ define amdgpu_kernel void @global_agent_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -6169,6 +6246,7 @@ define amdgpu_kernel void @global_agent_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -6250,6 +6328,7 @@ define amdgpu_kernel void @global_agent_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -6269,6 +6348,7 @@ define amdgpu_kernel void @global_agent_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -6382,6 +6462,7 @@ define amdgpu_kernel void @global_agent_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -6401,6 +6482,7 @@ define amdgpu_kernel void @global_agent_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -6422,6 +6504,7 @@ define amdgpu_kernel void @global_agent_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -6442,6 +6525,7 @@ define amdgpu_kernel void @global_agent_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -6523,6 +6607,7 @@ define amdgpu_kernel void @global_agent_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -6542,6 +6627,7 @@ define amdgpu_kernel void @global_agent_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -6655,6 +6741,7 @@ define amdgpu_kernel void @global_agent_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -6674,6 +6761,7 @@ define amdgpu_kernel void @global_agent_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -6695,6 +6783,7 @@ define amdgpu_kernel void @global_agent_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -6715,6 +6804,7 @@ define amdgpu_kernel void @global_agent_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -6796,6 +6886,7 @@ define amdgpu_kernel void @global_agent_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -6815,6 +6906,7 @@ define amdgpu_kernel void @global_agent_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -6928,6 +7020,7 @@ define amdgpu_kernel void @global_agent_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -6947,6 +7040,7 @@ define amdgpu_kernel void @global_agent_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -6968,6 +7062,7 @@ define amdgpu_kernel void @global_agent_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -6988,6 +7083,7 @@ define amdgpu_kernel void @global_agent_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -12945,6 +13041,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -12958,6 +13055,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -13037,6 +13135,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -13050,6 +13149,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -13063,6 +13163,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -13075,6 +13176,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -13311,6 +13413,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -13326,6 +13429,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -13414,6 +13518,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -13429,6 +13534,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -13446,6 +13552,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -13462,6 +13569,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -13519,6 +13627,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -13534,6 +13643,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -13622,6 +13732,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -13637,6 +13748,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -13654,6 +13766,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -13670,6 +13783,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -14620,6 +14734,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -14637,6 +14752,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -14741,6 +14857,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -14758,6 +14875,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -14775,6 +14893,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -14791,6 +14910,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -15116,6 +15236,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -15135,6 +15256,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -15248,6 +15370,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -15267,6 +15390,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -15288,6 +15412,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -15308,6 +15433,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -15389,6 +15515,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -15408,6 +15535,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -15521,6 +15649,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -15540,6 +15669,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -15561,6 +15691,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -15581,6 +15712,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -15658,6 +15790,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -15675,6 +15808,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -15779,6 +15913,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -15796,6 +15931,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -15813,6 +15949,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -15829,6 +15966,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -15906,6 +16044,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -15923,6 +16062,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16027,6 +16167,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16044,6 +16185,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16061,6 +16203,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -16077,6 +16220,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -16158,6 +16302,7 @@ define amdgpu_kernel void @global_agent_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16177,6 +16322,7 @@ define amdgpu_kernel void @global_agent_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16290,6 +16436,7 @@ define amdgpu_kernel void @global_agent_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16309,6 +16456,7 @@ define amdgpu_kernel void @global_agent_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16330,6 +16478,7 @@ define amdgpu_kernel void @global_agent_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -16350,6 +16499,7 @@ define amdgpu_kernel void @global_agent_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -16431,6 +16581,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16450,6 +16601,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16563,6 +16715,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16582,6 +16735,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16603,6 +16757,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -16623,6 +16778,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -16704,6 +16860,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16723,6 +16880,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16836,6 +16994,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16855,6 +17014,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16876,6 +17036,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -16896,6 +17057,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -16977,6 +17139,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16996,6 +17159,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -17109,6 +17273,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -17128,6 +17293,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -17149,6 +17315,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -17169,6 +17336,7 @@ define amdgpu_kernel void @global_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -17250,6 +17418,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -17269,6 +17438,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -17382,6 +17552,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -17401,6 +17572,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -17422,6 +17594,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -17442,6 +17615,7 @@ define amdgpu_kernel void @global_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -17523,6 +17697,7 @@ define amdgpu_kernel void @global_agent_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -17542,6 +17717,7 @@ define amdgpu_kernel void @global_agent_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -17655,6 +17831,7 @@ define amdgpu_kernel void @global_agent_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -17674,6 +17851,7 @@ define amdgpu_kernel void @global_agent_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -17695,6 +17873,7 @@ define amdgpu_kernel void @global_agent_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -17715,6 +17894,7 @@ define amdgpu_kernel void @global_agent_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -17796,6 +17976,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -17815,6 +17996,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -17928,6 +18110,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -17947,6 +18130,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -17968,6 +18152,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -17988,6 +18173,7 @@ define amdgpu_kernel void @global_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
@@ -18069,6 +18255,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -18088,6 +18275,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -18201,6 +18389,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -18220,6 +18409,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -18241,6 +18431,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_DEV
; GFX12-WGP-NEXT: s_endpgm
@@ -18261,6 +18452,7 @@ define amdgpu_kernel void @global_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_DEV
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
; GFX12-CU-NEXT: s_endpgm
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-singlethread.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-singlethread.ll
index 8042d38716107..c9fc9e66622be 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-singlethread.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-singlethread.ll
@@ -418,6 +418,7 @@ define amdgpu_kernel void @global_singlethread_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -586,6 +587,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_load(
; GFX6-NEXT: s_mov_b32 s5, s14
; GFX6-NEXT: s_mov_b32 s6, s13
; GFX6-NEXT: s_mov_b32 s7, s12
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_load_dword v0, off, s[8:11], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -601,7 +603,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -650,6 +654,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s1, s10
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s9
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s8
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_load_dword v0, off, s[4:7], 0
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -726,6 +731,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_load(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX12-WGP-NEXT: s_load_b64 s[2:3], s[4:5], 0x0
; GFX12-WGP-NEXT: s_load_b64 s[0:1], s[4:5], 0x8
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: global_load_b32 v1, v0, s[2:3]
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
@@ -737,6 +743,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX12-CU-NEXT: s_load_b64 s[2:3], s[4:5], 0x0
; GFX12-CU-NEXT: s_load_b64 s[0:1], s[4:5], 0x8
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: global_load_b32 v1, v0, s[2:3]
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -1077,6 +1084,7 @@ define amdgpu_kernel void @global_singlethread_release_store(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -1091,6 +1099,7 @@ define amdgpu_kernel void @global_singlethread_release_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1101,6 +1110,7 @@ define amdgpu_kernel void @global_singlethread_release_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1111,6 +1121,7 @@ define amdgpu_kernel void @global_singlethread_release_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -1129,6 +1140,7 @@ define amdgpu_kernel void @global_singlethread_release_store(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1139,6 +1151,7 @@ define amdgpu_kernel void @global_singlethread_release_store(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1149,6 +1162,7 @@ define amdgpu_kernel void @global_singlethread_release_store(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1159,6 +1173,7 @@ define amdgpu_kernel void @global_singlethread_release_store(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1169,6 +1184,7 @@ define amdgpu_kernel void @global_singlethread_release_store(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1179,6 +1195,7 @@ define amdgpu_kernel void @global_singlethread_release_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1189,6 +1206,7 @@ define amdgpu_kernel void @global_singlethread_release_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -1233,6 +1251,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_store(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -1247,6 +1266,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1257,6 +1277,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1267,6 +1288,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -1285,6 +1307,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1295,6 +1318,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1305,6 +1329,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1315,6 +1340,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1325,6 +1351,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_store(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1335,6 +1362,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1345,6 +1373,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -1543,6 +1572,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1557,6 +1587,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1567,6 +1598,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1577,6 +1609,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1594,6 +1627,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1604,6 +1638,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1614,6 +1649,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1624,6 +1660,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1634,6 +1671,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1644,6 +1682,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1654,6 +1693,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1664,6 +1704,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_acquire_atomicrmw:
@@ -1674,6 +1715,7 @@ define amdgpu_kernel void @global_singlethread_acquire_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -1696,6 +1738,7 @@ define amdgpu_kernel void @global_singlethread_release_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -1710,6 +1753,7 @@ define amdgpu_kernel void @global_singlethread_release_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1720,6 +1764,7 @@ define amdgpu_kernel void @global_singlethread_release_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1730,6 +1775,7 @@ define amdgpu_kernel void @global_singlethread_release_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -1747,6 +1793,7 @@ define amdgpu_kernel void @global_singlethread_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1757,6 +1804,7 @@ define amdgpu_kernel void @global_singlethread_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1767,6 +1815,7 @@ define amdgpu_kernel void @global_singlethread_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1777,6 +1826,7 @@ define amdgpu_kernel void @global_singlethread_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1787,6 +1837,7 @@ define amdgpu_kernel void @global_singlethread_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1797,6 +1848,7 @@ define amdgpu_kernel void @global_singlethread_release_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1807,6 +1859,7 @@ define amdgpu_kernel void @global_singlethread_release_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -1850,7 +1903,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1864,7 +1919,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1874,7 +1931,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1884,7 +1943,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1901,7 +1962,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1911,7 +1974,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1921,7 +1986,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1931,7 +1998,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1941,7 +2010,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1951,7 +2022,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1961,7 +2034,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1972,6 +2047,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_acq_rel_atomicrmw:
@@ -1982,6 +2058,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -2004,7 +2081,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2018,7 +2097,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2028,7 +2109,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2038,7 +2121,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2055,7 +2140,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2065,7 +2152,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2075,7 +2164,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2085,7 +2176,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2095,7 +2188,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2105,7 +2200,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2115,7 +2212,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2126,6 +2225,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_seq_cst_atomicrmw:
@@ -2136,6 +2236,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -2175,6 +2276,7 @@ define amdgpu_kernel void @global_singlethread_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -2341,6 +2443,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0 glc
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -2357,7 +2460,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -2371,6 +2476,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2383,6 +2489,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2402,6 +2509,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0 glc
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -2414,6 +2522,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2426,6 +2535,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2438,6 +2548,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -2450,6 +2561,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -2462,6 +2574,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -2474,6 +2587,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -2524,6 +2638,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0 glc
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -2540,7 +2655,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -2554,6 +2671,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2566,6 +2684,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2585,6 +2704,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0 glc
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -2597,6 +2717,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2609,6 +2730,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2621,6 +2743,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -2633,6 +2756,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -2645,6 +2769,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -2657,6 +2782,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -2932,6 +3058,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -2960,6 +3087,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -2974,6 +3102,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -2988,6 +3117,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -3010,6 +3140,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -3024,6 +3155,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -3038,6 +3170,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -3052,6 +3185,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -3066,6 +3200,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -3080,6 +3215,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -3094,6 +3230,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -3108,6 +3245,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
@@ -3122,6 +3260,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -3150,6 +3289,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -3178,6 +3318,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX7-NEXT: s_endpgm
;
@@ -3192,6 +3333,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
; GFX10-WGP-NEXT: s_endpgm
;
@@ -3206,6 +3348,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -3228,6 +3371,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -3242,6 +3386,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -3256,6 +3401,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3270,6 +3416,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -3284,6 +3431,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3298,6 +3446,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -3312,6 +3461,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -3369,7 +3519,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3397,7 +3549,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3411,7 +3565,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3425,7 +3581,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3447,7 +3605,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3461,7 +3621,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3475,7 +3637,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3489,7 +3653,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3503,7 +3669,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3517,7 +3685,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3531,7 +3701,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3546,6 +3718,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3560,6 +3733,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -3588,7 +3762,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3616,7 +3792,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3630,7 +3808,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3644,7 +3824,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3666,7 +3848,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3680,7 +3864,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3694,7 +3880,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3708,7 +3896,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3722,7 +3912,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3736,7 +3928,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3750,7 +3944,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3765,6 +3961,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3779,6 +3976,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -3808,6 +4006,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3836,6 +4035,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3850,6 +4050,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3864,6 +4065,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3886,6 +4088,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3900,6 +4103,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3914,6 +4118,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3928,6 +4133,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3942,6 +4148,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3956,6 +4163,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3970,6 +4178,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3984,6 +4193,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
@@ -3998,6 +4208,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4027,6 +4238,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4055,6 +4267,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4069,6 +4282,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4083,6 +4297,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4105,6 +4320,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4119,6 +4335,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4133,6 +4350,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4147,6 +4365,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4161,6 +4380,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4175,6 +4395,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4189,6 +4410,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4203,6 +4425,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_acquire_acquire_cmpxchg:
@@ -4217,6 +4440,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4245,7 +4469,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4273,7 +4499,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4287,7 +4515,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4301,7 +4531,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4323,7 +4555,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4337,7 +4571,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4351,7 +4587,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4365,7 +4603,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4379,7 +4619,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4393,7 +4635,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4407,7 +4651,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4422,6 +4668,7 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_release_acquire_cmpxchg:
@@ -4436,6 +4683,7 @@ define amdgpu_kernel void @global_singlethread_release_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4464,7 +4712,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4492,7 +4742,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4506,7 +4758,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4520,7 +4774,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4542,7 +4798,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4556,7 +4814,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4570,7 +4830,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4584,7 +4846,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4598,7 +4862,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4612,7 +4878,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4626,7 +4894,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4641,6 +4911,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
@@ -4655,6 +4926,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4683,7 +4955,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4711,7 +4985,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4725,7 +5001,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4739,7 +5017,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4761,7 +5041,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4775,7 +5057,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4789,7 +5073,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4803,7 +5089,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4817,7 +5105,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4831,7 +5121,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4845,7 +5137,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4860,6 +5154,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
@@ -4874,6 +5169,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4902,7 +5198,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4930,7 +5228,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4944,7 +5244,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4958,7 +5260,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4980,7 +5284,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4994,7 +5300,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5008,7 +5316,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5022,7 +5332,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5036,7 +5348,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5050,7 +5364,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5064,7 +5380,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5079,6 +5397,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
@@ -5093,6 +5412,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5121,7 +5441,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5149,7 +5471,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5163,7 +5487,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5177,7 +5503,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5199,7 +5527,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5213,7 +5543,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5227,7 +5559,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5241,7 +5575,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5255,7 +5591,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5269,7 +5607,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5283,7 +5623,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5298,6 +5640,7 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
@@ -5312,6 +5655,7 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5340,7 +5684,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5368,7 +5714,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5382,7 +5730,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5396,7 +5746,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5418,7 +5770,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5432,7 +5786,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5446,7 +5802,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5460,7 +5818,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5474,7 +5834,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5488,7 +5850,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5502,7 +5866,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5517,6 +5883,7 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_release_seq_cst_cmpxchg:
@@ -5531,6 +5898,7 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5559,7 +5927,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5587,7 +5957,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5601,7 +5973,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5615,7 +5989,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5637,7 +6013,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5651,7 +6029,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5665,7 +6045,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5679,7 +6061,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5693,7 +6077,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5707,7 +6093,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5721,7 +6109,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5736,6 +6126,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -5750,6 +6141,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5778,7 +6170,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5806,7 +6200,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5820,7 +6216,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5834,7 +6232,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5856,7 +6256,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5870,7 +6272,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5884,7 +6288,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5898,7 +6304,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5912,7 +6320,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5926,7 +6336,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5940,7 +6352,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5955,6 +6369,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5969,6 +6384,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -6249,8 +6665,8 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -6280,6 +6696,7 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -6338,8 +6755,8 @@ define amdgpu_kernel void @global_singlethread_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -6499,6 +6916,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
@@ -6530,6 +6948,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
@@ -6548,6 +6967,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6564,6 +6984,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6588,6 +7009,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
@@ -6605,6 +7027,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6621,6 +7044,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6637,6 +7061,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -6653,6 +7078,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -6669,6 +7095,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -6685,6 +7112,7 @@ define amdgpu_kernel void @global_singlethread_release_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -6750,9 +7178,10 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -6781,7 +7210,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -6799,6 +7230,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6815,6 +7247,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6839,9 +7272,10 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -6856,6 +7290,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6872,6 +7307,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6888,6 +7324,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -6904,6 +7341,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -6920,6 +7358,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -6936,6 +7375,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -7001,9 +7441,10 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -7032,7 +7473,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -7050,6 +7493,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7066,6 +7510,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7090,9 +7535,10 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -7107,6 +7553,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7123,6 +7570,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7139,6 +7587,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -7155,6 +7604,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -7171,6 +7621,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -7187,6 +7638,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -7253,8 +7705,8 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -7284,6 +7736,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -7342,8 +7795,8 @@ define amdgpu_kernel void @global_singlethread_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -7504,8 +7957,8 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_ret_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -7535,6 +7988,7 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -7593,8 +8047,8 @@ define amdgpu_kernel void @global_singlethread_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -7754,9 +8208,10 @@ define amdgpu_kernel void @global_singlethread_release_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -7785,7 +8240,9 @@ define amdgpu_kernel void @global_singlethread_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -7803,6 +8260,7 @@ define amdgpu_kernel void @global_singlethread_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7819,6 +8277,7 @@ define amdgpu_kernel void @global_singlethread_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7843,9 +8302,10 @@ define amdgpu_kernel void @global_singlethread_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -7860,6 +8320,7 @@ define amdgpu_kernel void @global_singlethread_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7876,6 +8337,7 @@ define amdgpu_kernel void @global_singlethread_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7892,6 +8354,7 @@ define amdgpu_kernel void @global_singlethread_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -7908,6 +8371,7 @@ define amdgpu_kernel void @global_singlethread_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -7924,6 +8388,7 @@ define amdgpu_kernel void @global_singlethread_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -7940,6 +8405,7 @@ define amdgpu_kernel void @global_singlethread_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8005,9 +8471,10 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8036,7 +8503,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -8054,6 +8523,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8070,6 +8540,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8094,9 +8565,10 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8111,6 +8583,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8127,6 +8600,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8143,6 +8617,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8159,6 +8634,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8175,6 +8651,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8191,6 +8668,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8256,9 +8734,10 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8287,7 +8766,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -8305,6 +8786,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8321,6 +8803,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8345,9 +8828,10 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8362,6 +8846,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8378,6 +8863,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8394,6 +8880,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8410,6 +8897,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8426,6 +8914,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8442,6 +8931,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8507,9 +8997,10 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8538,7 +9029,9 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -8556,6 +9049,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8572,6 +9066,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8596,9 +9091,10 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8613,6 +9109,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8629,6 +9126,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8645,6 +9143,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8661,6 +9160,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8677,6 +9177,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8693,6 +9194,7 @@ define amdgpu_kernel void @global_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8758,9 +9260,10 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8789,7 +9292,9 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -8807,6 +9312,7 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8823,6 +9329,7 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8847,9 +9354,10 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8864,6 +9372,7 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8880,6 +9389,7 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8896,6 +9406,7 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8912,6 +9423,7 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8928,6 +9440,7 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8944,6 +9457,7 @@ define amdgpu_kernel void @global_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9009,9 +9523,10 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -9040,7 +9555,9 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -9058,6 +9575,7 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9074,6 +9592,7 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9098,9 +9617,10 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9115,6 +9635,7 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9131,6 +9652,7 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9147,6 +9669,7 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9163,6 +9686,7 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9179,6 +9703,7 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9195,6 +9720,7 @@ define amdgpu_kernel void @global_singlethread_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9260,9 +9786,10 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -9291,7 +9818,9 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -9309,6 +9838,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9325,6 +9855,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9349,9 +9880,10 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9366,6 +9898,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9382,6 +9915,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9398,6 +9932,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9414,6 +9949,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9430,6 +9966,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9446,6 +9983,7 @@ define amdgpu_kernel void @global_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9511,9 +10049,10 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -9542,7 +10081,9 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -9560,6 +10101,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9576,6 +10118,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9600,9 +10143,10 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9617,6 +10161,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9633,6 +10178,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9649,6 +10195,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9665,6 +10212,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9681,6 +10229,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9697,6 +10246,7 @@ define amdgpu_kernel void @global_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -10147,6 +10697,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -10315,6 +10866,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_load(
; GFX6-NEXT: s_mov_b32 s5, s14
; GFX6-NEXT: s_mov_b32 s6, s13
; GFX6-NEXT: s_mov_b32 s7, s12
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_load_dword v0, off, s[8:11], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -10330,7 +10882,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -10379,6 +10933,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s1, s10
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s9
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s8
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_load_dword v0, off, s[4:7], 0
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -10455,6 +11010,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_load(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX12-WGP-NEXT: s_load_b64 s[2:3], s[4:5], 0x0
; GFX12-WGP-NEXT: s_load_b64 s[0:1], s[4:5], 0x8
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: global_load_b32 v1, v0, s[2:3]
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
@@ -10466,6 +11022,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX12-CU-NEXT: s_load_b64 s[2:3], s[4:5], 0x0
; GFX12-CU-NEXT: s_load_b64 s[0:1], s[4:5], 0x8
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: global_load_b32 v1, v0, s[2:3]
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -10806,6 +11363,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_store(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -10820,6 +11378,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -10830,6 +11389,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10840,6 +11400,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -10858,6 +11419,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_store(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10868,6 +11430,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_store(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10878,6 +11441,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_store(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -10888,6 +11452,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_store(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10898,6 +11463,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_store(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -10908,6 +11474,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -10918,6 +11485,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -10962,6 +11530,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_store(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -10976,6 +11545,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -10986,6 +11556,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10996,6 +11567,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -11014,6 +11586,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11024,6 +11597,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11034,6 +11608,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11044,6 +11619,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11054,6 +11630,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_store(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -11064,6 +11641,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -11074,6 +11652,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -11272,6 +11851,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11286,6 +11866,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11296,6 +11877,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11306,6 +11888,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11323,6 +11906,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11333,6 +11917,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11343,6 +11928,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11353,6 +11939,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11363,6 +11950,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11373,6 +11961,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11383,6 +11972,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11393,6 +11983,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_acquire_atomicrmw:
@@ -11403,6 +11994,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -11425,6 +12017,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -11439,6 +12032,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -11449,6 +12043,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -11459,6 +12054,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -11476,6 +12072,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11486,6 +12083,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11496,6 +12094,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11506,6 +12105,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11516,6 +12116,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -11526,6 +12127,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -11536,6 +12138,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -11579,7 +12182,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11593,7 +12198,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11603,7 +12210,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11613,7 +12222,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11630,7 +12241,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11640,7 +12253,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11650,7 +12265,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11660,7 +12277,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11670,7 +12289,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11680,7 +12301,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11690,7 +12313,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11701,6 +12326,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
@@ -11711,6 +12337,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -11733,7 +12360,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11747,7 +12376,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11757,7 +12388,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11767,7 +12400,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11784,7 +12419,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11794,7 +12431,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11804,7 +12443,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11814,7 +12455,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11824,7 +12467,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11834,7 +12479,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11844,7 +12491,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11855,6 +12504,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
@@ -11865,6 +12515,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -11904,6 +12555,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -12070,6 +12722,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0 glc
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -12086,7 +12739,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -12100,6 +12755,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12112,6 +12768,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12131,6 +12788,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0 glc
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -12143,6 +12801,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12155,6 +12814,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12167,6 +12827,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -12179,6 +12840,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -12191,6 +12853,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -12203,6 +12866,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -12253,6 +12917,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0 glc
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -12269,7 +12934,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -12283,6 +12950,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12295,6 +12963,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12314,6 +12983,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0 glc
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -12326,6 +12996,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12338,6 +13009,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12350,6 +13022,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -12362,6 +13035,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -12374,6 +13048,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -12386,6 +13061,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -12661,6 +13337,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12689,6 +13366,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12703,6 +13381,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12717,6 +13396,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12739,6 +13419,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12753,6 +13434,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12767,6 +13449,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12781,6 +13464,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12795,6 +13479,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12809,6 +13494,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12823,6 +13509,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12837,6 +13524,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -12851,6 +13539,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -12879,6 +13568,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -12907,6 +13597,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX7-NEXT: s_endpgm
;
@@ -12921,6 +13612,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
; GFX10-WGP-NEXT: s_endpgm
;
@@ -12935,6 +13627,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -12957,6 +13650,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -12971,6 +13665,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -12985,6 +13680,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -12999,6 +13695,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -13013,6 +13710,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -13027,6 +13725,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -13041,6 +13740,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -13098,7 +13798,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13126,7 +13828,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13140,7 +13844,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13154,7 +13860,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13176,7 +13884,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13190,7 +13900,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13204,7 +13916,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13218,7 +13932,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13232,7 +13948,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13246,7 +13964,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13260,7 +13980,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13275,6 +13997,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -13289,6 +14012,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -13317,7 +14041,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13345,7 +14071,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13359,7 +14087,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13373,7 +14103,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13395,7 +14127,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13409,7 +14143,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13423,7 +14159,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13437,7 +14175,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13451,7 +14191,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13465,7 +14207,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13479,7 +14223,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13494,6 +14240,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -13508,6 +14255,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -13537,6 +14285,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13565,6 +14314,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13579,6 +14329,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13593,6 +14344,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13615,6 +14367,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13629,6 +14382,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13643,6 +14397,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13657,6 +14412,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13671,6 +14427,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13685,6 +14442,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13699,6 +14457,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13713,6 +14472,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -13727,6 +14487,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -13756,6 +14517,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13784,6 +14546,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13798,6 +14561,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13812,6 +14576,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13834,6 +14599,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13848,6 +14614,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13862,6 +14629,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13876,6 +14644,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13890,6 +14659,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13904,6 +14674,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13918,6 +14689,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13932,6 +14704,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -13946,6 +14719,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -13974,7 +14748,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14002,7 +14778,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14016,7 +14794,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14030,7 +14810,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14052,7 +14834,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14066,7 +14850,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14080,7 +14866,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14094,7 +14882,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14108,7 +14898,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14122,7 +14914,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14136,7 +14930,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14151,6 +14947,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
@@ -14165,6 +14962,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14193,7 +14991,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14221,7 +15021,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14235,7 +15037,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14249,7 +15053,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14271,7 +15077,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14285,7 +15093,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14299,7 +15109,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14313,7 +15125,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14327,7 +15141,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14341,7 +15157,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14355,7 +15173,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14370,6 +15190,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -14384,6 +15205,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14412,7 +15234,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14440,7 +15264,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14454,7 +15280,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14468,7 +15296,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14490,7 +15320,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14504,7 +15336,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14518,7 +15352,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14532,7 +15368,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14546,7 +15384,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14560,7 +15400,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14574,7 +15416,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14589,6 +15433,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -14603,6 +15448,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14631,7 +15477,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14659,7 +15507,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14673,7 +15523,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14687,7 +15539,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14709,7 +15563,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14723,7 +15579,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14737,7 +15595,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14751,7 +15611,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14765,7 +15627,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14779,7 +15643,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14793,7 +15659,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14808,6 +15676,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -14822,6 +15691,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14850,7 +15720,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -14878,7 +15750,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -14892,7 +15766,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -14906,7 +15782,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -14928,7 +15806,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -14942,7 +15822,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -14956,7 +15838,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -14970,7 +15854,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -14984,7 +15870,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -14998,7 +15886,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15012,7 +15902,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15027,6 +15919,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -15041,6 +15934,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15069,7 +15963,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15097,7 +15993,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15111,7 +16009,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15125,7 +16025,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15147,7 +16049,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15161,7 +16065,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15175,7 +16081,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15189,7 +16097,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15203,7 +16113,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15217,7 +16129,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15231,7 +16145,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15246,6 +16162,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -15260,6 +16177,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15288,7 +16206,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15316,7 +16236,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15330,7 +16252,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15344,7 +16268,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15366,7 +16292,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15380,7 +16308,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15394,7 +16324,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15408,7 +16340,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15422,7 +16356,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15436,7 +16372,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15450,7 +16388,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15465,6 +16405,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15479,6 +16420,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15507,7 +16449,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15535,7 +16479,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15549,7 +16495,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15563,7 +16511,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15585,7 +16535,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15599,7 +16551,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15613,7 +16567,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15627,7 +16583,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15641,7 +16599,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15655,7 +16615,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15669,7 +16631,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15684,6 +16648,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15698,6 +16663,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15978,8 +16944,8 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_ret_cmpx
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -16009,6 +16975,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_ret_cmpx
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -16067,8 +17034,8 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_monotonic_ret_cmpx
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -16228,6 +17195,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_ret_cmpx
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
@@ -16259,6 +17227,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_ret_cmpx
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
@@ -16277,6 +17246,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_ret_cmpx
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16293,6 +17263,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_ret_cmpx
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16317,6 +17288,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_ret_cmpx
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
@@ -16334,6 +17306,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_ret_cmpx
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16350,6 +17323,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_ret_cmpx
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16366,6 +17340,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_ret_cmpx
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16382,6 +17357,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_ret_cmpx
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16398,6 +17374,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_ret_cmpx
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16414,6 +17391,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_monotonic_ret_cmpx
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16479,9 +17457,10 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_ret_cmpx
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -16510,7 +17489,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_ret_cmpx
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -16528,6 +17509,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_ret_cmpx
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16544,6 +17526,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_ret_cmpx
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16568,9 +17551,10 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_ret_cmpx
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -16585,6 +17569,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_ret_cmpx
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16601,6 +17586,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_ret_cmpx
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16617,6 +17603,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_ret_cmpx
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16633,6 +17620,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_ret_cmpx
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16649,6 +17637,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_ret_cmpx
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16665,6 +17654,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_monotonic_ret_cmpx
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16730,9 +17720,10 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_ret_cmpx
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -16761,7 +17752,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_ret_cmpx
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -16779,6 +17772,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_ret_cmpx
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16795,6 +17789,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_ret_cmpx
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16819,9 +17814,10 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_ret_cmpx
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -16836,6 +17832,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_ret_cmpx
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16852,6 +17849,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_ret_cmpx
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16868,6 +17866,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_ret_cmpx
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16884,6 +17883,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_ret_cmpx
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16900,6 +17900,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_ret_cmpx
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16916,6 +17917,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_monotonic_ret_cmpx
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16982,8 +17984,8 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_ret_cmpx
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -17013,6 +18015,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_ret_cmpx
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -17071,8 +18074,8 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_acquire_ret_cmpx
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -17233,8 +18236,8 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_ret_cmpxch
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -17264,6 +18267,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -17322,8 +18326,8 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_acquire_ret_cmpxch
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -17483,9 +18487,10 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_ret_cmpxch
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -17514,7 +18519,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -17532,6 +18539,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17548,6 +18556,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17572,9 +18581,10 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -17589,6 +18599,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17605,6 +18616,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17621,6 +18633,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -17637,6 +18650,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_ret_cmpxch
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -17653,6 +18667,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17669,6 +18684,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_acquire_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17734,9 +18750,10 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_ret_cmpxch
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -17765,7 +18782,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -17783,6 +18802,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17799,6 +18819,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17823,9 +18844,10 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -17840,6 +18862,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17856,6 +18879,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17872,6 +18896,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -17888,6 +18913,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_ret_cmpxch
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -17904,6 +18930,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17920,6 +18947,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_acquire_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17985,9 +19013,10 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_ret_cmpxch
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -18016,7 +19045,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18034,6 +19065,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18050,6 +19082,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18074,9 +19107,10 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18091,6 +19125,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18107,6 +19142,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18123,6 +19159,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18139,6 +19176,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_ret_cmpxch
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18155,6 +19193,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18171,6 +19210,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_acquire_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18236,9 +19276,10 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_ret_cmpx
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -18267,7 +19308,9 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_ret_cmpx
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18285,6 +19328,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_ret_cmpx
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18301,6 +19345,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_ret_cmpx
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18325,9 +19370,10 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_ret_cmpx
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18342,6 +19388,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_ret_cmpx
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18358,6 +19405,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_ret_cmpx
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18374,6 +19422,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_ret_cmpx
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18390,6 +19439,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_ret_cmpx
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18406,6 +19456,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_ret_cmpx
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18422,6 +19473,7 @@ define amdgpu_kernel void @global_singlethread_one_as_monotonic_seq_cst_ret_cmpx
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18487,9 +19539,10 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_ret_cmpxch
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -18518,7 +19571,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18536,6 +19591,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18552,6 +19608,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18576,9 +19633,10 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18593,6 +19651,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18609,6 +19668,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18625,6 +19685,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18641,6 +19702,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_ret_cmpxch
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18657,6 +19719,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18673,6 +19736,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acquire_seq_cst_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18738,9 +19802,10 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_ret_cmpxch
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -18769,7 +19834,9 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18787,6 +19854,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18803,6 +19871,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18827,9 +19896,10 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18844,6 +19914,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18860,6 +19931,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18876,6 +19948,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18892,6 +19965,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_ret_cmpxch
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18908,6 +19982,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18924,6 +19999,7 @@ define amdgpu_kernel void @global_singlethread_one_as_release_seq_cst_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18989,9 +20065,10 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxch
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -19020,7 +20097,9 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -19038,6 +20117,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19054,6 +20134,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19078,9 +20159,10 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -19095,6 +20177,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19111,6 +20194,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19127,6 +20211,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19143,6 +20228,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxch
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19159,6 +20245,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19175,6 +20262,7 @@ define amdgpu_kernel void @global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19240,9 +20328,10 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxch
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -19271,7 +20360,9 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxch
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -19289,6 +20380,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxch
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19305,6 +20397,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxch
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19329,9 +20422,10 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxch
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -19346,6 +20440,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxch
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19362,6 +20457,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxch
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19378,6 +20474,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxch
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19394,6 +20491,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxch
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19410,6 +20508,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxch
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19426,6 +20525,7 @@ define amdgpu_kernel void @global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxch
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll
index be148464c156e..3601ac06fc0b0 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll
@@ -1684,6 +1684,7 @@ define amdgpu_kernel void @global_system_acquire_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -1697,6 +1698,7 @@ define amdgpu_kernel void @global_system_acquire_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -1778,6 +1780,7 @@ define amdgpu_kernel void @global_system_acquire_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -1791,6 +1794,7 @@ define amdgpu_kernel void @global_system_acquire_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -1804,6 +1808,7 @@ define amdgpu_kernel void @global_system_acquire_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -1816,6 +1821,7 @@ define amdgpu_kernel void @global_system_acquire_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -2056,6 +2062,7 @@ define amdgpu_kernel void @global_system_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -2071,6 +2078,7 @@ define amdgpu_kernel void @global_system_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -2163,6 +2171,7 @@ define amdgpu_kernel void @global_system_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -2178,6 +2187,7 @@ define amdgpu_kernel void @global_system_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -2196,6 +2206,7 @@ define amdgpu_kernel void @global_system_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -2213,6 +2224,7 @@ define amdgpu_kernel void @global_system_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -2270,6 +2282,7 @@ define amdgpu_kernel void @global_system_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -2285,6 +2298,7 @@ define amdgpu_kernel void @global_system_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -2377,6 +2391,7 @@ define amdgpu_kernel void @global_system_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -2392,6 +2407,7 @@ define amdgpu_kernel void @global_system_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -2410,6 +2426,7 @@ define amdgpu_kernel void @global_system_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -2427,6 +2444,7 @@ define amdgpu_kernel void @global_system_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -3391,6 +3409,7 @@ define amdgpu_kernel void @global_system_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -3408,6 +3427,7 @@ define amdgpu_kernel void @global_system_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -3514,6 +3534,7 @@ define amdgpu_kernel void @global_system_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -3531,6 +3552,7 @@ define amdgpu_kernel void @global_system_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -3548,6 +3570,7 @@ define amdgpu_kernel void @global_system_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -3564,6 +3587,7 @@ define amdgpu_kernel void @global_system_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -3893,6 +3917,7 @@ define amdgpu_kernel void @global_system_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -3912,6 +3937,7 @@ define amdgpu_kernel void @global_system_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -4029,6 +4055,7 @@ define amdgpu_kernel void @global_system_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -4048,6 +4075,7 @@ define amdgpu_kernel void @global_system_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -4070,6 +4098,7 @@ define amdgpu_kernel void @global_system_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -4091,6 +4120,7 @@ define amdgpu_kernel void @global_system_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -4172,6 +4202,7 @@ define amdgpu_kernel void @global_system_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -4191,6 +4222,7 @@ define amdgpu_kernel void @global_system_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -4308,6 +4340,7 @@ define amdgpu_kernel void @global_system_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -4327,6 +4360,7 @@ define amdgpu_kernel void @global_system_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -4349,6 +4383,7 @@ define amdgpu_kernel void @global_system_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -4370,6 +4405,7 @@ define amdgpu_kernel void @global_system_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -4447,6 +4483,7 @@ define amdgpu_kernel void @global_system_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -4464,6 +4501,7 @@ define amdgpu_kernel void @global_system_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -4570,6 +4608,7 @@ define amdgpu_kernel void @global_system_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -4587,6 +4626,7 @@ define amdgpu_kernel void @global_system_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -4604,6 +4644,7 @@ define amdgpu_kernel void @global_system_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -4620,6 +4661,7 @@ define amdgpu_kernel void @global_system_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -4697,6 +4739,7 @@ define amdgpu_kernel void @global_system_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -4714,6 +4757,7 @@ define amdgpu_kernel void @global_system_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -4820,6 +4864,7 @@ define amdgpu_kernel void @global_system_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -4837,6 +4882,7 @@ define amdgpu_kernel void @global_system_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -4854,6 +4900,7 @@ define amdgpu_kernel void @global_system_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -4870,6 +4917,7 @@ define amdgpu_kernel void @global_system_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -4951,6 +4999,7 @@ define amdgpu_kernel void @global_system_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -4970,6 +5019,7 @@ define amdgpu_kernel void @global_system_release_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -5087,6 +5137,7 @@ define amdgpu_kernel void @global_system_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -5106,6 +5157,7 @@ define amdgpu_kernel void @global_system_release_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -5128,6 +5180,7 @@ define amdgpu_kernel void @global_system_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -5149,6 +5202,7 @@ define amdgpu_kernel void @global_system_release_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -5230,6 +5284,7 @@ define amdgpu_kernel void @global_system_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -5249,6 +5304,7 @@ define amdgpu_kernel void @global_system_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -5366,6 +5422,7 @@ define amdgpu_kernel void @global_system_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -5385,6 +5442,7 @@ define amdgpu_kernel void @global_system_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -5407,6 +5465,7 @@ define amdgpu_kernel void @global_system_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -5428,6 +5487,7 @@ define amdgpu_kernel void @global_system_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -5509,6 +5569,7 @@ define amdgpu_kernel void @global_system_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -5528,6 +5589,7 @@ define amdgpu_kernel void @global_system_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -5645,6 +5707,7 @@ define amdgpu_kernel void @global_system_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -5664,6 +5727,7 @@ define amdgpu_kernel void @global_system_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -5686,6 +5750,7 @@ define amdgpu_kernel void @global_system_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -5707,6 +5772,7 @@ define amdgpu_kernel void @global_system_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -5788,6 +5854,7 @@ define amdgpu_kernel void @global_system_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -5807,6 +5874,7 @@ define amdgpu_kernel void @global_system_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -5924,6 +5992,7 @@ define amdgpu_kernel void @global_system_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -5943,6 +6012,7 @@ define amdgpu_kernel void @global_system_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -5965,6 +6035,7 @@ define amdgpu_kernel void @global_system_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -5986,6 +6057,7 @@ define amdgpu_kernel void @global_system_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -11745,6 +11817,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -11758,6 +11831,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -11839,6 +11913,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -11852,6 +11927,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -11865,6 +11941,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -11877,6 +11954,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -12117,6 +12195,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -12132,6 +12211,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -12224,6 +12304,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -12239,6 +12320,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -12257,6 +12339,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -12274,6 +12357,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -12331,6 +12415,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -12346,6 +12431,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -12438,6 +12524,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -12453,6 +12540,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -12471,6 +12559,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -12488,6 +12577,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -13452,6 +13542,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -13469,6 +13560,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -13575,6 +13667,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -13592,6 +13685,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -13609,6 +13703,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -13625,6 +13720,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -13954,6 +14050,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -13973,6 +14070,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -14090,6 +14188,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -14109,6 +14208,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -14131,6 +14231,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -14152,6 +14253,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -14233,6 +14335,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -14252,6 +14355,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -14369,6 +14473,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -14388,6 +14493,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -14410,6 +14516,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -14431,6 +14538,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -14508,6 +14616,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -14525,6 +14634,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -14631,6 +14741,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -14648,6 +14759,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -14665,6 +14777,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -14681,6 +14794,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -14758,6 +14872,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -14775,6 +14890,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -14881,6 +14997,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -14898,6 +15015,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -14915,6 +15033,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -14931,6 +15050,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -15012,6 +15132,7 @@ define amdgpu_kernel void @global_system_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -15031,6 +15152,7 @@ define amdgpu_kernel void @global_system_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -15148,6 +15270,7 @@ define amdgpu_kernel void @global_system_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -15167,6 +15290,7 @@ define amdgpu_kernel void @global_system_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -15189,6 +15313,7 @@ define amdgpu_kernel void @global_system_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -15210,6 +15335,7 @@ define amdgpu_kernel void @global_system_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -15291,6 +15417,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -15310,6 +15437,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -15427,6 +15555,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -15446,6 +15575,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -15468,6 +15598,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -15489,6 +15620,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -15570,6 +15702,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -15589,6 +15722,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -15706,6 +15840,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -15725,6 +15860,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -15747,6 +15883,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -15768,6 +15905,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -15849,6 +15987,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -15868,6 +16007,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -15985,6 +16125,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16004,6 +16145,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16026,6 +16168,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -16047,6 +16190,7 @@ define amdgpu_kernel void @global_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -16128,6 +16272,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16147,6 +16292,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16264,6 +16410,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16283,6 +16430,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16305,6 +16453,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -16326,6 +16475,7 @@ define amdgpu_kernel void @global_system_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -16407,6 +16557,7 @@ define amdgpu_kernel void @global_system_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16426,6 +16577,7 @@ define amdgpu_kernel void @global_system_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16543,6 +16695,7 @@ define amdgpu_kernel void @global_system_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16562,6 +16715,7 @@ define amdgpu_kernel void @global_system_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16584,6 +16738,7 @@ define amdgpu_kernel void @global_system_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -16605,6 +16760,7 @@ define amdgpu_kernel void @global_system_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -16686,6 +16842,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16705,6 +16862,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -16822,6 +16980,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -16841,6 +17000,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -16863,6 +17023,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -16884,6 +17045,7 @@ define amdgpu_kernel void @global_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
@@ -16965,6 +17127,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl1_inv
; GFX10-WGP-NEXT: buffer_gl0_inv
@@ -16984,6 +17147,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: buffer_gl1_inv
; GFX10-CU-NEXT: buffer_gl0_inv
@@ -17101,6 +17265,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl1_inv
; GFX11-WGP-NEXT: buffer_gl0_inv
@@ -17120,6 +17285,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: buffer_gl1_inv
; GFX11-CU-NEXT: buffer_gl0_inv
@@ -17142,6 +17308,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SYS
; GFX12-WGP-NEXT: s_endpgm
@@ -17163,6 +17330,7 @@ define amdgpu_kernel void @global_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_SYS
; GFX12-CU-NEXT: s_endpgm
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll
index 8a5c5dda9f79c..98b26224ac71c 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll
@@ -403,6 +403,7 @@ define amdgpu_kernel void @global_volatile_store_0(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: s_endpgm
;
@@ -416,6 +417,7 @@ define amdgpu_kernel void @global_volatile_store_0(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: s_endpgm
;
@@ -450,6 +452,7 @@ define amdgpu_kernel void @global_volatile_store_0(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: s_endpgm
;
@@ -463,6 +466,7 @@ define amdgpu_kernel void @global_volatile_store_0(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: s_endpgm
;
@@ -481,6 +485,7 @@ define amdgpu_kernel void @global_volatile_store_0(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_store_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_endpgm
;
@@ -499,6 +504,7 @@ define amdgpu_kernel void @global_volatile_store_0(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %in, ptr addrspace(1) %out) {
@@ -576,6 +582,7 @@ define amdgpu_kernel void @global_volatile_store_1(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: s_endpgm
;
@@ -590,6 +597,7 @@ define amdgpu_kernel void @global_volatile_store_1(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: s_endpgm
;
@@ -631,6 +639,7 @@ define amdgpu_kernel void @global_volatile_store_1(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: s_endpgm
;
@@ -647,6 +656,7 @@ define amdgpu_kernel void @global_volatile_store_1(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: s_endpgm
;
@@ -669,6 +679,7 @@ define amdgpu_kernel void @global_volatile_store_1(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_store_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_endpgm
;
@@ -691,6 +702,7 @@ define amdgpu_kernel void @global_volatile_store_1(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %in, ptr addrspace(1) %out) {
@@ -739,6 +751,7 @@ define amdgpu_kernel void @global_volatile_workgroup_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -966,7 +979,7 @@ define amdgpu_kernel void @global_volatile_workgroup_release_store(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(1) %out) {
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-wavefront.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-wavefront.ll
index 151ba07a0b531..df8d39b2c152a 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-wavefront.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-wavefront.ll
@@ -418,6 +418,7 @@ define amdgpu_kernel void @global_wavefront_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -586,6 +587,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_load(
; GFX6-NEXT: s_mov_b32 s5, s14
; GFX6-NEXT: s_mov_b32 s6, s13
; GFX6-NEXT: s_mov_b32 s7, s12
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_load_dword v0, off, s[8:11], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -601,7 +603,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -650,6 +654,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s1, s10
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s9
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s8
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_load_dword v0, off, s[4:7], 0
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -726,6 +731,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_load(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX12-WGP-NEXT: s_load_b64 s[2:3], s[4:5], 0x0
; GFX12-WGP-NEXT: s_load_b64 s[0:1], s[4:5], 0x8
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: global_load_b32 v1, v0, s[2:3]
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
@@ -737,6 +743,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX12-CU-NEXT: s_load_b64 s[2:3], s[4:5], 0x0
; GFX12-CU-NEXT: s_load_b64 s[0:1], s[4:5], 0x8
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: global_load_b32 v1, v0, s[2:3]
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -1077,6 +1084,7 @@ define amdgpu_kernel void @global_wavefront_release_store(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -1091,6 +1099,7 @@ define amdgpu_kernel void @global_wavefront_release_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1101,6 +1110,7 @@ define amdgpu_kernel void @global_wavefront_release_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1111,6 +1121,7 @@ define amdgpu_kernel void @global_wavefront_release_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -1129,6 +1140,7 @@ define amdgpu_kernel void @global_wavefront_release_store(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1139,6 +1151,7 @@ define amdgpu_kernel void @global_wavefront_release_store(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1149,6 +1162,7 @@ define amdgpu_kernel void @global_wavefront_release_store(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1159,6 +1173,7 @@ define amdgpu_kernel void @global_wavefront_release_store(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1169,6 +1184,7 @@ define amdgpu_kernel void @global_wavefront_release_store(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1179,6 +1195,7 @@ define amdgpu_kernel void @global_wavefront_release_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1189,6 +1206,7 @@ define amdgpu_kernel void @global_wavefront_release_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -1233,6 +1251,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_store(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -1247,6 +1266,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1257,6 +1277,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1267,6 +1288,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -1285,6 +1307,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1295,6 +1318,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1305,6 +1329,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1315,6 +1340,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1325,6 +1351,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_store(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1335,6 +1362,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1345,6 +1373,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -1543,6 +1572,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1557,6 +1587,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1567,6 +1598,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1577,6 +1609,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1594,6 +1627,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1604,6 +1638,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1614,6 +1649,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1624,6 +1660,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1634,6 +1671,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1644,6 +1682,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1654,6 +1693,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1664,6 +1704,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_acquire_atomicrmw:
@@ -1674,6 +1715,7 @@ define amdgpu_kernel void @global_wavefront_acquire_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -1696,6 +1738,7 @@ define amdgpu_kernel void @global_wavefront_release_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -1710,6 +1753,7 @@ define amdgpu_kernel void @global_wavefront_release_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -1720,6 +1764,7 @@ define amdgpu_kernel void @global_wavefront_release_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1730,6 +1775,7 @@ define amdgpu_kernel void @global_wavefront_release_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -1747,6 +1793,7 @@ define amdgpu_kernel void @global_wavefront_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1757,6 +1804,7 @@ define amdgpu_kernel void @global_wavefront_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1767,6 +1815,7 @@ define amdgpu_kernel void @global_wavefront_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1777,6 +1826,7 @@ define amdgpu_kernel void @global_wavefront_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1787,6 +1837,7 @@ define amdgpu_kernel void @global_wavefront_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1797,6 +1848,7 @@ define amdgpu_kernel void @global_wavefront_release_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1807,6 +1859,7 @@ define amdgpu_kernel void @global_wavefront_release_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -1850,7 +1903,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1864,7 +1919,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1874,7 +1931,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1884,7 +1943,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1901,7 +1962,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1911,7 +1974,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1921,7 +1986,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1931,7 +1998,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1941,7 +2010,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1951,7 +2022,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1961,7 +2034,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1972,6 +2047,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_acq_rel_atomicrmw:
@@ -1982,6 +2058,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -2004,7 +2081,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2018,7 +2097,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2028,7 +2109,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2038,7 +2121,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2055,7 +2140,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2065,7 +2152,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2075,7 +2164,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2085,7 +2176,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2095,7 +2188,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2105,7 +2200,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2115,7 +2212,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2126,6 +2225,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_seq_cst_atomicrmw:
@@ -2136,6 +2236,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -2175,6 +2276,7 @@ define amdgpu_kernel void @global_wavefront_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -2341,6 +2443,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0 glc
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -2357,7 +2460,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -2371,6 +2476,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2383,6 +2489,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2402,6 +2509,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0 glc
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -2414,6 +2522,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2426,6 +2535,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2438,6 +2548,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -2450,6 +2561,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -2462,6 +2574,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -2474,6 +2587,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -2524,6 +2638,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0 glc
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -2540,7 +2655,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -2554,6 +2671,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2566,6 +2684,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2585,6 +2704,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0 glc
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -2597,6 +2717,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2609,6 +2730,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -2621,6 +2743,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -2633,6 +2756,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -2645,6 +2769,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -2657,6 +2782,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -2932,6 +3058,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -2960,6 +3087,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -2974,6 +3102,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -2988,6 +3117,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -3010,6 +3140,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -3024,6 +3155,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -3038,6 +3170,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -3052,6 +3185,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -3066,6 +3200,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -3080,6 +3215,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -3094,6 +3230,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -3108,6 +3245,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
@@ -3122,6 +3260,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -3150,6 +3289,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -3178,6 +3318,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX7-NEXT: s_endpgm
;
@@ -3192,6 +3333,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
; GFX10-WGP-NEXT: s_endpgm
;
@@ -3206,6 +3348,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -3228,6 +3371,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -3242,6 +3386,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -3256,6 +3401,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3270,6 +3416,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -3284,6 +3431,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3298,6 +3446,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -3312,6 +3461,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -3369,7 +3519,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3397,7 +3549,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3411,7 +3565,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3425,7 +3581,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3447,7 +3605,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3461,7 +3621,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3475,7 +3637,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3489,7 +3653,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3503,7 +3669,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3517,7 +3685,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3531,7 +3701,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3546,6 +3718,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3560,6 +3733,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -3588,7 +3762,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3616,7 +3792,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3630,7 +3808,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3644,7 +3824,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3666,7 +3848,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3680,7 +3864,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3694,7 +3880,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3708,7 +3896,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3722,7 +3912,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3736,7 +3928,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3750,7 +3944,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3765,6 +3961,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3779,6 +3976,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -3808,6 +4006,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3836,6 +4035,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3850,6 +4050,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3864,6 +4065,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3886,6 +4088,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3900,6 +4103,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3914,6 +4118,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3928,6 +4133,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3942,6 +4148,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3956,6 +4163,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3970,6 +4178,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3984,6 +4193,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
@@ -3998,6 +4208,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4027,6 +4238,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4055,6 +4267,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4069,6 +4282,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4083,6 +4297,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4105,6 +4320,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4119,6 +4335,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4133,6 +4350,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4147,6 +4365,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4161,6 +4380,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4175,6 +4395,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4189,6 +4410,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4203,6 +4425,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_acquire_acquire_cmpxchg:
@@ -4217,6 +4440,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4245,7 +4469,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4273,7 +4499,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4287,7 +4515,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4301,7 +4531,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4323,7 +4555,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4337,7 +4571,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4351,7 +4587,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4365,7 +4603,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4379,7 +4619,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4393,7 +4635,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4407,7 +4651,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4422,6 +4668,7 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_release_acquire_cmpxchg:
@@ -4436,6 +4683,7 @@ define amdgpu_kernel void @global_wavefront_release_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4464,7 +4712,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4492,7 +4742,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4506,7 +4758,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4520,7 +4774,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4542,7 +4798,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4556,7 +4814,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4570,7 +4830,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4584,7 +4846,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4598,7 +4862,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4612,7 +4878,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4626,7 +4894,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4641,6 +4911,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
@@ -4655,6 +4926,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4683,7 +4955,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4711,7 +4985,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4725,7 +5001,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4739,7 +5017,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4761,7 +5041,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4775,7 +5057,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4789,7 +5073,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4803,7 +5089,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4817,7 +5105,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4831,7 +5121,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4845,7 +5137,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4860,6 +5154,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
@@ -4874,6 +5169,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4902,7 +5198,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4930,7 +5228,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4944,7 +5244,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4958,7 +5260,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4980,7 +5284,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4994,7 +5300,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5008,7 +5316,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5022,7 +5332,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5036,7 +5348,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5050,7 +5364,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5064,7 +5380,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5079,6 +5397,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
@@ -5093,6 +5412,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5121,7 +5441,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5149,7 +5471,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5163,7 +5487,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5177,7 +5503,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5199,7 +5527,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5213,7 +5543,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5227,7 +5559,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5241,7 +5575,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5255,7 +5591,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5269,7 +5607,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5283,7 +5623,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5298,6 +5640,7 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
@@ -5312,6 +5655,7 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5340,7 +5684,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5368,7 +5714,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5382,7 +5730,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5396,7 +5746,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5418,7 +5770,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5432,7 +5786,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5446,7 +5802,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5460,7 +5818,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5474,7 +5834,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5488,7 +5850,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5502,7 +5866,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5517,6 +5883,7 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_release_seq_cst_cmpxchg:
@@ -5531,6 +5898,7 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5559,7 +5927,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5587,7 +5957,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5601,7 +5973,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5615,7 +5989,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5637,7 +6013,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5651,7 +6029,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5665,7 +6045,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5679,7 +6061,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5693,7 +6077,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5707,7 +6093,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5721,7 +6109,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5736,6 +6126,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -5750,6 +6141,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5778,7 +6170,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5806,7 +6200,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5820,7 +6216,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5834,7 +6232,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5856,7 +6256,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5870,7 +6272,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5884,7 +6288,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5898,7 +6304,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5912,7 +6320,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5926,7 +6336,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5940,7 +6352,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5955,6 +6369,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5969,6 +6384,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -6249,8 +6665,8 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -6280,6 +6696,7 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -6338,8 +6755,8 @@ define amdgpu_kernel void @global_wavefront_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -6499,6 +6916,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
@@ -6530,6 +6948,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
@@ -6548,6 +6967,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6564,6 +6984,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6588,6 +7009,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
@@ -6605,6 +7027,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6621,6 +7044,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6637,6 +7061,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -6653,6 +7078,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -6669,6 +7095,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -6685,6 +7112,7 @@ define amdgpu_kernel void @global_wavefront_release_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -6750,9 +7178,10 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -6781,7 +7210,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -6799,6 +7230,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6815,6 +7247,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6839,9 +7272,10 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -6856,6 +7290,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6872,6 +7307,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -6888,6 +7324,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -6904,6 +7341,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -6920,6 +7358,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -6936,6 +7375,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -7001,9 +7441,10 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -7032,7 +7473,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -7050,6 +7493,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7066,6 +7510,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7090,9 +7535,10 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -7107,6 +7553,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7123,6 +7570,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7139,6 +7587,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -7155,6 +7604,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -7171,6 +7621,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -7187,6 +7638,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -7253,8 +7705,8 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -7284,6 +7736,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -7342,8 +7795,8 @@ define amdgpu_kernel void @global_wavefront_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -7504,8 +7957,8 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_ret_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -7535,6 +7988,7 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -7593,8 +8047,8 @@ define amdgpu_kernel void @global_wavefront_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -7754,9 +8208,10 @@ define amdgpu_kernel void @global_wavefront_release_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -7785,7 +8240,9 @@ define amdgpu_kernel void @global_wavefront_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -7803,6 +8260,7 @@ define amdgpu_kernel void @global_wavefront_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7819,6 +8277,7 @@ define amdgpu_kernel void @global_wavefront_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7843,9 +8302,10 @@ define amdgpu_kernel void @global_wavefront_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -7860,6 +8320,7 @@ define amdgpu_kernel void @global_wavefront_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7876,6 +8337,7 @@ define amdgpu_kernel void @global_wavefront_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -7892,6 +8354,7 @@ define amdgpu_kernel void @global_wavefront_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -7908,6 +8371,7 @@ define amdgpu_kernel void @global_wavefront_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -7924,6 +8388,7 @@ define amdgpu_kernel void @global_wavefront_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -7940,6 +8405,7 @@ define amdgpu_kernel void @global_wavefront_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8005,9 +8471,10 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8036,7 +8503,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -8054,6 +8523,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8070,6 +8540,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8094,9 +8565,10 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8111,6 +8583,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8127,6 +8600,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8143,6 +8617,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8159,6 +8634,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8175,6 +8651,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8191,6 +8668,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8256,9 +8734,10 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8287,7 +8766,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -8305,6 +8786,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8321,6 +8803,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8345,9 +8828,10 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8362,6 +8846,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8378,6 +8863,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8394,6 +8880,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8410,6 +8897,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8426,6 +8914,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8442,6 +8931,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8507,9 +8997,10 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8538,7 +9029,9 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -8556,6 +9049,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8572,6 +9066,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8596,9 +9091,10 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8613,6 +9109,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8629,6 +9126,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8645,6 +9143,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8661,6 +9160,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8677,6 +9177,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8693,6 +9194,7 @@ define amdgpu_kernel void @global_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8758,9 +9260,10 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8789,7 +9292,9 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -8807,6 +9312,7 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8823,6 +9329,7 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8847,9 +9354,10 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8864,6 +9372,7 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8880,6 +9389,7 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -8896,6 +9406,7 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8912,6 +9423,7 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -8928,6 +9440,7 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8944,6 +9457,7 @@ define amdgpu_kernel void @global_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9009,9 +9523,10 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -9040,7 +9555,9 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -9058,6 +9575,7 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9074,6 +9592,7 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9098,9 +9617,10 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9115,6 +9635,7 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9131,6 +9652,7 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9147,6 +9669,7 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9163,6 +9686,7 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9179,6 +9703,7 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9195,6 +9720,7 @@ define amdgpu_kernel void @global_wavefront_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9260,9 +9786,10 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -9291,7 +9818,9 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -9309,6 +9838,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9325,6 +9855,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9349,9 +9880,10 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9366,6 +9898,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9382,6 +9915,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9398,6 +9932,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9414,6 +9949,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9430,6 +9966,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9446,6 +9983,7 @@ define amdgpu_kernel void @global_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9511,9 +10049,10 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -9542,7 +10081,9 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -9560,6 +10101,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9576,6 +10118,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9600,9 +10143,10 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9617,6 +10161,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9633,6 +10178,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -9649,6 +10195,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9665,6 +10212,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -9681,6 +10229,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9697,6 +10246,7 @@ define amdgpu_kernel void @global_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -10147,6 +10697,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -10315,6 +10866,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_load(
; GFX6-NEXT: s_mov_b32 s5, s14
; GFX6-NEXT: s_mov_b32 s6, s13
; GFX6-NEXT: s_mov_b32 s7, s12
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_load_dword v0, off, s[8:11], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -10330,7 +10882,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -10379,6 +10933,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s1, s10
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s9
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s8
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_load_dword v0, off, s[4:7], 0
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -10455,6 +11010,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_load(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX12-WGP-NEXT: s_load_b64 s[2:3], s[4:5], 0x0
; GFX12-WGP-NEXT: s_load_b64 s[0:1], s[4:5], 0x8
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: global_load_b32 v1, v0, s[2:3]
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
@@ -10466,6 +11022,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX12-CU-NEXT: s_load_b64 s[2:3], s[4:5], 0x0
; GFX12-CU-NEXT: s_load_b64 s[0:1], s[4:5], 0x8
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: global_load_b32 v1, v0, s[2:3]
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -10806,6 +11363,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_store(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -10820,6 +11378,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -10830,6 +11389,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10840,6 +11400,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -10858,6 +11419,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_store(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10868,6 +11430,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_store(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10878,6 +11441,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_store(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -10888,6 +11452,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_store(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10898,6 +11463,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_store(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -10908,6 +11474,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -10918,6 +11485,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -10962,6 +11530,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_store(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -10976,6 +11545,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -10986,6 +11556,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_store(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10996,6 +11567,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -11014,6 +11586,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11024,6 +11597,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11034,6 +11608,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11044,6 +11619,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11054,6 +11630,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_store(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -11064,6 +11641,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_store(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, 0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -11074,6 +11652,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -11272,6 +11851,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11286,6 +11866,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11296,6 +11877,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11306,6 +11888,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11323,6 +11906,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11333,6 +11917,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11343,6 +11928,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11353,6 +11939,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11363,6 +11950,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11373,6 +11961,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11383,6 +11972,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11393,6 +11983,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_acquire_atomicrmw:
@@ -11403,6 +11994,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -11425,6 +12017,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -11439,6 +12032,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -11449,6 +12043,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX10-WGP-NEXT: s_endpgm
;
@@ -11459,6 +12054,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -11476,6 +12072,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11486,6 +12083,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11496,6 +12094,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11506,6 +12105,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11516,6 +12116,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -11526,6 +12127,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
; GFX11-WGP-NEXT: s_endpgm
;
@@ -11536,6 +12138,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -11579,7 +12182,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11593,7 +12198,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11603,7 +12210,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11613,7 +12222,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11630,7 +12241,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11640,7 +12253,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11650,7 +12265,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11660,7 +12277,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11670,7 +12289,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11680,7 +12301,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11690,7 +12313,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11701,6 +12326,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
@@ -11711,6 +12337,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -11733,7 +12360,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11747,7 +12376,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11757,7 +12388,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11767,7 +12400,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11784,7 +12419,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11794,7 +12431,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11804,7 +12443,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11814,7 +12455,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11824,7 +12467,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11834,7 +12479,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11844,7 +12491,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11855,6 +12504,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
@@ -11865,6 +12515,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -11904,6 +12555,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -12070,6 +12722,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0 glc
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -12086,7 +12739,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -12100,6 +12755,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12112,6 +12768,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12131,6 +12788,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0 glc
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -12143,6 +12801,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12155,6 +12814,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12167,6 +12827,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -12179,6 +12840,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -12191,6 +12853,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -12203,6 +12866,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -12253,6 +12917,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0 glc
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -12269,7 +12934,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -12283,6 +12950,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12295,6 +12963,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12314,6 +12983,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0 glc
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -12326,6 +12996,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12338,6 +13009,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -12350,6 +13022,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -12362,6 +13035,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -12374,6 +13048,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -12386,6 +13061,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -12661,6 +13337,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12689,6 +13366,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12703,6 +13381,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12717,6 +13396,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12739,6 +13419,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12753,6 +13434,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12767,6 +13449,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12781,6 +13464,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12795,6 +13479,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12809,6 +13494,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12823,6 +13509,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12837,6 +13524,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -12851,6 +13539,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -12879,6 +13568,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -12907,6 +13597,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX7-NEXT: s_endpgm
;
@@ -12921,6 +13612,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
; GFX10-WGP-NEXT: s_endpgm
;
@@ -12935,6 +13627,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -12957,6 +13650,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -12971,6 +13665,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -12985,6 +13680,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -12999,6 +13695,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -13013,6 +13710,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -13027,6 +13725,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -13041,6 +13740,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -13098,7 +13798,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13126,7 +13828,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13140,7 +13844,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13154,7 +13860,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13176,7 +13884,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13190,7 +13900,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13204,7 +13916,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13218,7 +13932,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13232,7 +13948,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13246,7 +13964,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13260,7 +13980,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13275,6 +13997,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -13289,6 +14012,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -13317,7 +14041,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13345,7 +14071,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13359,7 +14087,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13373,7 +14103,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13395,7 +14127,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13409,7 +14143,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13423,7 +14159,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13437,7 +14175,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13451,7 +14191,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13465,7 +14207,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13479,7 +14223,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13494,6 +14240,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -13508,6 +14255,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -13537,6 +14285,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13565,6 +14314,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13579,6 +14329,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13593,6 +14344,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13615,6 +14367,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13629,6 +14382,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13643,6 +14397,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13657,6 +14412,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13671,6 +14427,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13685,6 +14442,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13699,6 +14457,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13713,6 +14472,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -13727,6 +14487,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -13756,6 +14517,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13784,6 +14546,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13798,6 +14561,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13812,6 +14576,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13834,6 +14599,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13848,6 +14614,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13862,6 +14629,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13876,6 +14644,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13890,6 +14659,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13904,6 +14674,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13918,6 +14689,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13932,6 +14704,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -13946,6 +14719,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -13974,7 +14748,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14002,7 +14778,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14016,7 +14794,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14030,7 +14810,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14052,7 +14834,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14066,7 +14850,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14080,7 +14866,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14094,7 +14882,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14108,7 +14898,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14122,7 +14914,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14136,7 +14930,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14151,6 +14947,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
@@ -14165,6 +14962,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14193,7 +14991,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14221,7 +15021,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14235,7 +15037,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14249,7 +15053,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14271,7 +15077,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14285,7 +15093,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14299,7 +15109,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14313,7 +15125,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14327,7 +15141,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14341,7 +15157,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14355,7 +15173,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14370,6 +15190,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -14384,6 +15205,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14412,7 +15234,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14440,7 +15264,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14454,7 +15280,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14468,7 +15296,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14490,7 +15320,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14504,7 +15336,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14518,7 +15352,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14532,7 +15368,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14546,7 +15384,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14560,7 +15400,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14574,7 +15416,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14589,6 +15433,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -14603,6 +15448,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14631,7 +15477,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14659,7 +15507,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14673,7 +15523,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14687,7 +15539,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14709,7 +15563,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14723,7 +15579,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14737,7 +15595,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14751,7 +15611,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14765,7 +15627,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14779,7 +15643,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14793,7 +15659,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14808,6 +15676,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -14822,6 +15691,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14850,7 +15720,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -14878,7 +15750,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -14892,7 +15766,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -14906,7 +15782,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -14928,7 +15806,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -14942,7 +15822,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -14956,7 +15838,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -14970,7 +15854,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -14984,7 +15870,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -14998,7 +15886,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15012,7 +15902,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15027,6 +15919,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -15041,6 +15934,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15069,7 +15963,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15097,7 +15993,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15111,7 +16009,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15125,7 +16025,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15147,7 +16049,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15161,7 +16065,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15175,7 +16081,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15189,7 +16097,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15203,7 +16113,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15217,7 +16129,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15231,7 +16145,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15246,6 +16162,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -15260,6 +16177,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15288,7 +16206,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15316,7 +16236,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15330,7 +16252,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15344,7 +16268,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15366,7 +16292,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15380,7 +16308,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15394,7 +16324,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15408,7 +16340,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15422,7 +16356,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15436,7 +16372,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15450,7 +16388,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15465,6 +16405,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -15479,6 +16420,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15507,7 +16449,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15535,7 +16479,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15549,7 +16495,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15563,7 +16511,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15585,7 +16535,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15599,7 +16551,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15613,7 +16567,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15627,7 +16583,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15641,7 +16599,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15655,7 +16615,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15669,7 +16631,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15684,6 +16648,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -15698,6 +16663,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15978,8 +16944,8 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_ret_cmpxchg
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -16009,6 +16975,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -16067,8 +17034,8 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_monotonic_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -16228,6 +17195,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
@@ -16259,6 +17227,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
@@ -16277,6 +17246,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16293,6 +17263,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16317,6 +17288,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
@@ -16334,6 +17306,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16350,6 +17323,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16366,6 +17340,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16382,6 +17357,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16398,6 +17374,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16414,6 +17391,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_monotonic_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16479,9 +17457,10 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -16510,7 +17489,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -16528,6 +17509,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16544,6 +17526,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16568,9 +17551,10 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -16585,6 +17569,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16601,6 +17586,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16617,6 +17603,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16633,6 +17620,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16649,6 +17637,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16665,6 +17654,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16730,9 +17720,10 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -16761,7 +17752,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -16779,6 +17772,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16795,6 +17789,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16819,9 +17814,10 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -16836,6 +17832,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16852,6 +17849,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -16868,6 +17866,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16884,6 +17883,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -16900,6 +17900,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16916,6 +17917,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -16982,8 +17984,8 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_ret_cmpxchg
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -17013,6 +18015,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -17071,8 +18074,8 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_acquire_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -17233,8 +18236,8 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -17264,6 +18267,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -17322,8 +18326,8 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -17483,9 +18487,10 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -17514,7 +18519,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -17532,6 +18539,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17548,6 +18556,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17572,9 +18581,10 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -17589,6 +18599,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17605,6 +18616,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17621,6 +18633,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -17637,6 +18650,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -17653,6 +18667,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17669,6 +18684,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17734,9 +18750,10 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -17765,7 +18782,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -17783,6 +18802,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17799,6 +18819,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17823,9 +18844,10 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -17840,6 +18862,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17856,6 +18879,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17872,6 +18896,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -17888,6 +18913,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -17904,6 +18930,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17920,6 +18947,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17985,9 +19013,10 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -18016,7 +19045,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18034,6 +19065,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18050,6 +19082,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18074,9 +19107,10 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18091,6 +19125,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18107,6 +19142,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18123,6 +19159,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18139,6 +19176,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18155,6 +19193,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18171,6 +19210,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18236,9 +19276,10 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -18267,7 +19308,9 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18285,6 +19328,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18301,6 +19345,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18325,9 +19370,10 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18342,6 +19388,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18358,6 +19405,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18374,6 +19422,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18390,6 +19439,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18406,6 +19456,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18422,6 +19473,7 @@ define amdgpu_kernel void @global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18487,9 +19539,10 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -18518,7 +19571,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18536,6 +19591,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18552,6 +19608,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18576,9 +19633,10 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18593,6 +19651,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18609,6 +19668,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18625,6 +19685,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18641,6 +19702,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18657,6 +19719,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18673,6 +19736,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18738,9 +19802,10 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -18769,7 +19834,9 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18787,6 +19854,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18803,6 +19871,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18827,9 +19896,10 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18844,6 +19914,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18860,6 +19931,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18876,6 +19948,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18892,6 +19965,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18908,6 +19982,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18924,6 +19999,7 @@ define amdgpu_kernel void @global_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18989,9 +20065,10 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -19020,7 +20097,9 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -19038,6 +20117,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19054,6 +20134,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19078,9 +20159,10 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -19095,6 +20177,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19111,6 +20194,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19127,6 +20211,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19143,6 +20228,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19159,6 +20245,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19175,6 +20262,7 @@ define amdgpu_kernel void @global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19240,9 +20328,10 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -19271,7 +20360,9 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -19289,6 +20380,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v3, s6
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19305,6 +20397,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19329,9 +20422,10 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -19346,6 +20440,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19362,6 +20457,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19378,6 +20474,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19394,6 +20491,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19410,6 +20508,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v3, s2
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19426,6 +20525,7 @@ define amdgpu_kernel void @global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll
index 69b0c7f93ab0e..fe65317212f22 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll
@@ -418,6 +418,7 @@ define amdgpu_kernel void @global_workgroup_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -609,6 +610,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_load(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -758,7 +760,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX12-CU-NEXT: s_load_b64 s[2:3], s[4:5], 0x0
; GFX12-CU-NEXT: s_load_b64 s[0:1], s[4:5], 0x8
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: global_load_b32 v1, v0, s[2:3]
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -1248,7 +1250,7 @@ define amdgpu_kernel void @global_workgroup_release_store(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(1) %out) {
@@ -1422,7 +1424,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_store(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(1) %out) {
@@ -1601,6 +1603,7 @@ define amdgpu_kernel void @global_workgroup_acquire_atomicrmw(
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_acquire_atomicrmw:
@@ -1615,6 +1618,7 @@ define amdgpu_kernel void @global_workgroup_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_acquire_atomicrmw:
@@ -1625,6 +1629,7 @@ define amdgpu_kernel void @global_workgroup_acquire_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -1637,6 +1642,7 @@ define amdgpu_kernel void @global_workgroup_acquire_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_acquire_atomicrmw:
@@ -1654,6 +1660,7 @@ define amdgpu_kernel void @global_workgroup_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_acquire_atomicrmw:
@@ -1664,6 +1671,7 @@ define amdgpu_kernel void @global_workgroup_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_acquire_atomicrmw:
@@ -1686,6 +1694,7 @@ define amdgpu_kernel void @global_workgroup_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_acquire_atomicrmw:
@@ -1708,6 +1717,7 @@ define amdgpu_kernel void @global_workgroup_acquire_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -1720,6 +1730,7 @@ define amdgpu_kernel void @global_workgroup_acquire_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_acquire_atomicrmw:
@@ -1730,6 +1741,7 @@ define amdgpu_kernel void @global_workgroup_acquire_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -1742,6 +1754,7 @@ define amdgpu_kernel void @global_workgroup_acquire_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -1912,7 +1925,7 @@ define amdgpu_kernel void @global_workgroup_release_atomicrmw(
; GFX12-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
@@ -1938,6 +1951,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s8
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_acq_rel_atomicrmw:
@@ -1953,6 +1967,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_acq_rel_atomicrmw:
@@ -1965,6 +1980,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -1978,6 +1994,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_acq_rel_atomicrmw:
@@ -1996,6 +2013,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_atomicrmw:
@@ -2007,6 +2025,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_acq_rel_atomicrmw:
@@ -2031,6 +2050,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_acq_rel_atomicrmw:
@@ -2056,6 +2076,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -2069,6 +2090,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_acq_rel_atomicrmw:
@@ -2083,6 +2105,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -2094,8 +2117,9 @@ define amdgpu_kernel void @global_workgroup_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -2120,6 +2144,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s8
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_seq_cst_atomicrmw:
@@ -2135,6 +2160,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_seq_cst_atomicrmw:
@@ -2147,6 +2173,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -2160,6 +2187,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_seq_cst_atomicrmw:
@@ -2178,6 +2206,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_atomicrmw:
@@ -2189,6 +2218,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_seq_cst_atomicrmw:
@@ -2213,6 +2243,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_seq_cst_atomicrmw:
@@ -2238,6 +2269,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -2251,6 +2283,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_seq_cst_atomicrmw:
@@ -2265,6 +2298,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -2276,8 +2310,9 @@ define amdgpu_kernel void @global_workgroup_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -2317,6 +2352,7 @@ define amdgpu_kernel void @global_workgroup_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -2507,6 +2543,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -2669,7 +2706,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -2715,6 +2752,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -2877,7 +2915,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -3129,6 +3167,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_acquire_monotonic_cmpxchg:
@@ -3157,6 +3196,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_acquire_monotonic_cmpxchg:
@@ -3171,6 +3211,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -3187,6 +3228,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_acquire_monotonic_cmpxchg:
@@ -3209,6 +3251,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_acquire_monotonic_cmpxchg:
@@ -3223,6 +3266,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_acquire_monotonic_cmpxchg:
@@ -3253,6 +3297,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_acquire_monotonic_cmpxchg:
@@ -3283,6 +3328,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -3299,6 +3345,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_acquire_monotonic_cmpxchg:
@@ -3313,6 +3360,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -3329,6 +3377,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -3564,7 +3613,7 @@ define amdgpu_kernel void @global_workgroup_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
@@ -3596,6 +3645,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_acq_rel_monotonic_cmpxchg:
@@ -3625,6 +3675,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_acq_rel_monotonic_cmpxchg:
@@ -3641,6 +3692,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -3658,6 +3710,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_acq_rel_monotonic_cmpxchg:
@@ -3681,6 +3734,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_monotonic_cmpxchg:
@@ -3696,6 +3750,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_acq_rel_monotonic_cmpxchg:
@@ -3728,6 +3783,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_acq_rel_monotonic_cmpxchg:
@@ -3761,6 +3817,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -3778,6 +3835,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_acq_rel_monotonic_cmpxchg:
@@ -3796,6 +3854,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -3811,8 +3870,9 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -3843,6 +3903,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_seq_cst_monotonic_cmpxchg:
@@ -3872,6 +3933,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_seq_cst_monotonic_cmpxchg:
@@ -3888,6 +3950,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -3905,6 +3968,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_seq_cst_monotonic_cmpxchg:
@@ -3928,6 +3992,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_monotonic_cmpxchg:
@@ -3943,6 +4008,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_seq_cst_monotonic_cmpxchg:
@@ -3975,6 +4041,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_seq_cst_monotonic_cmpxchg:
@@ -4008,6 +4075,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -4025,6 +4093,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_seq_cst_monotonic_cmpxchg:
@@ -4043,6 +4112,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -4058,8 +4128,9 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4089,6 +4160,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_monotonic_acquire_cmpxchg:
@@ -4117,6 +4189,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_monotonic_acquire_cmpxchg:
@@ -4131,6 +4204,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -4147,6 +4221,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_monotonic_acquire_cmpxchg:
@@ -4169,6 +4244,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_monotonic_acquire_cmpxchg:
@@ -4183,6 +4259,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_monotonic_acquire_cmpxchg:
@@ -4213,6 +4290,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_monotonic_acquire_cmpxchg:
@@ -4243,6 +4321,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -4259,6 +4338,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_monotonic_acquire_cmpxchg:
@@ -4273,6 +4353,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -4289,6 +4370,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4318,6 +4400,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_acquire_acquire_cmpxchg:
@@ -4346,6 +4429,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_acquire_acquire_cmpxchg:
@@ -4360,6 +4444,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -4376,6 +4461,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_acquire_acquire_cmpxchg:
@@ -4398,6 +4484,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_acquire_acquire_cmpxchg:
@@ -4412,6 +4499,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_acquire_acquire_cmpxchg:
@@ -4442,6 +4530,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_acquire_acquire_cmpxchg:
@@ -4472,6 +4561,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -4488,6 +4578,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_acquire_acquire_cmpxchg:
@@ -4502,6 +4593,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -4518,6 +4610,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4548,6 +4641,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_release_acquire_cmpxchg:
@@ -4577,6 +4671,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_release_acquire_cmpxchg:
@@ -4593,6 +4688,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -4610,6 +4706,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_release_acquire_cmpxchg:
@@ -4633,6 +4730,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_release_acquire_cmpxchg:
@@ -4648,6 +4746,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_release_acquire_cmpxchg:
@@ -4680,6 +4779,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_release_acquire_cmpxchg:
@@ -4713,6 +4813,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -4730,6 +4831,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_release_acquire_cmpxchg:
@@ -4748,6 +4850,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -4763,8 +4866,9 @@ define amdgpu_kernel void @global_workgroup_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -4795,6 +4899,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_acq_rel_acquire_cmpxchg:
@@ -4824,6 +4929,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_acq_rel_acquire_cmpxchg:
@@ -4840,6 +4946,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -4857,6 +4964,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_acq_rel_acquire_cmpxchg:
@@ -4880,6 +4988,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_acquire_cmpxchg:
@@ -4895,6 +5004,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_acq_rel_acquire_cmpxchg:
@@ -4927,6 +5037,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_acq_rel_acquire_cmpxchg:
@@ -4960,6 +5071,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -4977,6 +5089,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_acq_rel_acquire_cmpxchg:
@@ -4995,6 +5108,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -5010,8 +5124,9 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5042,6 +5157,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_seq_cst_acquire_cmpxchg:
@@ -5071,6 +5187,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_seq_cst_acquire_cmpxchg:
@@ -5087,6 +5204,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -5104,6 +5222,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_seq_cst_acquire_cmpxchg:
@@ -5127,6 +5246,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_acquire_cmpxchg:
@@ -5142,6 +5262,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_seq_cst_acquire_cmpxchg:
@@ -5174,6 +5295,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_seq_cst_acquire_cmpxchg:
@@ -5207,6 +5329,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -5224,6 +5347,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_seq_cst_acquire_cmpxchg:
@@ -5242,6 +5366,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -5257,8 +5382,9 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5289,6 +5415,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_monotonic_seq_cst_cmpxchg:
@@ -5318,6 +5445,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_monotonic_seq_cst_cmpxchg:
@@ -5334,6 +5462,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -5351,6 +5480,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_monotonic_seq_cst_cmpxchg:
@@ -5374,6 +5504,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_monotonic_seq_cst_cmpxchg:
@@ -5389,6 +5520,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_monotonic_seq_cst_cmpxchg:
@@ -5421,6 +5553,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_monotonic_seq_cst_cmpxchg:
@@ -5454,6 +5587,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -5471,6 +5605,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_monotonic_seq_cst_cmpxchg:
@@ -5489,6 +5624,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -5504,8 +5640,9 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5536,6 +5673,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_acquire_seq_cst_cmpxchg:
@@ -5565,6 +5703,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_acquire_seq_cst_cmpxchg:
@@ -5581,6 +5720,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -5598,6 +5738,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_acquire_seq_cst_cmpxchg:
@@ -5621,6 +5762,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_acquire_seq_cst_cmpxchg:
@@ -5636,6 +5778,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_acquire_seq_cst_cmpxchg:
@@ -5668,6 +5811,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_acquire_seq_cst_cmpxchg:
@@ -5701,6 +5845,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -5718,6 +5863,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_acquire_seq_cst_cmpxchg:
@@ -5736,6 +5882,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -5751,8 +5898,9 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -5783,6 +5931,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_release_seq_cst_cmpxchg:
@@ -5812,6 +5961,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_release_seq_cst_cmpxchg:
@@ -5828,6 +5978,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -5845,6 +5996,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_release_seq_cst_cmpxchg:
@@ -5868,6 +6020,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_release_seq_cst_cmpxchg:
@@ -5883,6 +6036,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_release_seq_cst_cmpxchg:
@@ -5915,6 +6069,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_release_seq_cst_cmpxchg:
@@ -5948,6 +6103,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -5965,6 +6121,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_release_seq_cst_cmpxchg:
@@ -5983,6 +6140,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -5998,8 +6156,9 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -6030,6 +6189,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_acq_rel_seq_cst_cmpxchg:
@@ -6059,6 +6219,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_acq_rel_seq_cst_cmpxchg:
@@ -6075,6 +6236,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -6092,6 +6254,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_acq_rel_seq_cst_cmpxchg:
@@ -6115,6 +6278,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_seq_cst_cmpxchg:
@@ -6130,6 +6294,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_acq_rel_seq_cst_cmpxchg:
@@ -6162,6 +6327,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_acq_rel_seq_cst_cmpxchg:
@@ -6195,6 +6361,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -6212,6 +6379,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_acq_rel_seq_cst_cmpxchg:
@@ -6230,6 +6398,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -6245,8 +6414,9 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -6277,6 +6447,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_seq_cst_seq_cst_cmpxchg:
@@ -6306,6 +6477,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_seq_cst_seq_cst_cmpxchg:
@@ -6322,6 +6494,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -6339,6 +6512,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_seq_cst_seq_cst_cmpxchg:
@@ -6362,6 +6536,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_seq_cst_cmpxchg:
@@ -6377,6 +6552,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_seq_cst_seq_cst_cmpxchg:
@@ -6409,6 +6585,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_seq_cst_seq_cst_cmpxchg:
@@ -6442,6 +6619,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -6459,6 +6637,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_seq_cst_seq_cst_cmpxchg:
@@ -6477,6 +6656,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -6492,8 +6672,9 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -6774,8 +6955,8 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_ret_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -6805,6 +6986,7 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -6864,8 +7046,8 @@ define amdgpu_kernel void @global_workgroup_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -7264,7 +7446,7 @@ define amdgpu_kernel void @global_workgroup_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -7300,8 +7482,8 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -7332,6 +7514,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -7395,8 +7578,8 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -7540,7 +7723,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -7576,8 +7759,8 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -7608,6 +7791,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -7671,8 +7855,8 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -7816,7 +8000,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -7851,8 +8035,8 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_ret_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -7882,6 +8066,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -7941,8 +8126,8 @@ define amdgpu_kernel void @global_workgroup_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8109,8 +8294,8 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_ret_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8140,6 +8325,7 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -8199,8 +8385,8 @@ define amdgpu_kernel void @global_workgroup_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8366,8 +8552,8 @@ define amdgpu_kernel void @global_workgroup_release_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8398,6 +8584,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -8461,8 +8648,8 @@ define amdgpu_kernel void @global_workgroup_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8606,7 +8793,7 @@ define amdgpu_kernel void @global_workgroup_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8642,8 +8829,8 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8674,6 +8861,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -8737,8 +8925,8 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -8882,7 +9070,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -8918,8 +9106,8 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -8950,6 +9138,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -9013,8 +9202,8 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9158,7 +9347,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9194,8 +9383,8 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -9226,6 +9415,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -9289,8 +9479,8 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9434,7 +9624,7 @@ define amdgpu_kernel void @global_workgroup_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9470,8 +9660,8 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -9502,6 +9692,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -9565,8 +9756,8 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9708,7 +9899,7 @@ define amdgpu_kernel void @global_workgroup_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -9744,8 +9935,8 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -9776,6 +9967,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -9839,8 +10031,8 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9984,7 +10176,7 @@ define amdgpu_kernel void @global_workgroup_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -10020,8 +10212,8 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -10052,6 +10244,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -10115,8 +10308,8 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10260,7 +10453,7 @@ define amdgpu_kernel void @global_workgroup_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -10296,8 +10489,8 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -10328,6 +10521,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -10391,8 +10585,8 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10536,7 +10730,7 @@ define amdgpu_kernel void @global_workgroup_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -10955,6 +11149,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_load(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -11128,6 +11323,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_load(
; GFX6-NEXT: s_mov_b32 s5, s14
; GFX6-NEXT: s_mov_b32 s6, s13
; GFX6-NEXT: s_mov_b32 s7, s12
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_load_dword v0, off, s[8:11], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -11143,7 +11339,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_load_dword v2, v[0:1]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -11194,6 +11392,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s1, s10
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s9
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s8
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_load_dword v0, off, s[4:7], 0
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -11292,6 +11491,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_load(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX12-CU-NEXT: s_load_b64 s[2:3], s[4:5], 0x0
; GFX12-CU-NEXT: s_load_b64 s[0:1], s[4:5], 0x8
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: global_load_b32 v1, v0, s[2:3]
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
@@ -11632,6 +11832,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_store(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -11646,6 +11847,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -11668,6 +11870,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -11686,6 +11889,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_store(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11696,6 +11900,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_store(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11717,6 +11922,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_store(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11750,6 +11956,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -11774,6 +11981,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_store(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(1) %out) {
@@ -11798,6 +12006,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_store(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -11812,6 +12021,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_store(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_store_dword v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -11834,6 +12044,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_store(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -11852,6 +12063,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11862,6 +12074,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11883,6 +12096,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11916,6 +12130,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_store(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -11940,6 +12155,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_store(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, 0
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(1) %out) {
@@ -12118,6 +12334,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_atomicrmw(
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_acquire_atomicrmw:
@@ -12132,6 +12349,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_acquire_atomicrmw:
@@ -12142,6 +12360,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -12154,6 +12373,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_acquire_atomicrmw:
@@ -12171,6 +12391,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_atomicrmw:
@@ -12181,6 +12402,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_acquire_atomicrmw:
@@ -12203,6 +12425,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_acquire_atomicrmw:
@@ -12225,6 +12448,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -12237,6 +12461,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_acquire_atomicrmw:
@@ -12247,6 +12472,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -12259,6 +12485,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -12281,6 +12508,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -12295,6 +12523,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
; GFX7-NEXT: s_endpgm
;
@@ -12317,6 +12546,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX10-CU-NEXT: s_endpgm
;
@@ -12334,6 +12564,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -12344,6 +12575,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -12365,6 +12597,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -12398,6 +12631,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
; GFX11-CU-NEXT: s_endpgm
;
@@ -12422,6 +12656,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_atomicrmw(
; GFX12-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
@@ -12445,7 +12680,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_acq_rel_atomicrmw:
@@ -12459,7 +12696,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_acq_rel_atomicrmw:
@@ -12472,6 +12711,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -12483,7 +12723,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_acq_rel_atomicrmw:
@@ -12500,7 +12742,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_atomicrmw:
@@ -12510,7 +12754,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_atomicrmw:
@@ -12533,7 +12779,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_atomicrmw:
@@ -12559,6 +12807,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -12570,7 +12819,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_acq_rel_atomicrmw:
@@ -12585,6 +12836,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -12596,7 +12848,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -12619,7 +12873,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_seq_cst_atomicrmw:
@@ -12633,7 +12889,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v[0:1], v2
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_seq_cst_atomicrmw:
@@ -12646,6 +12904,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -12657,7 +12916,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_seq_cst_atomicrmw:
@@ -12674,7 +12935,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_atomicrmw:
@@ -12684,7 +12947,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[4:5]
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_atomicrmw:
@@ -12707,7 +12972,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_atomicrmw:
@@ -12733,6 +13000,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -12744,7 +13012,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_seq_cst_atomicrmw:
@@ -12759,6 +13029,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_swap_b32 v0, v1, s[0:1] scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -12770,7 +13041,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_swap_b32 v0, v1, s[0:1]
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in) {
entry:
@@ -12810,6 +13083,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -12981,6 +13255,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0 glc
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -12997,7 +13272,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -13026,6 +13303,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -13045,6 +13323,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0 glc
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -13057,6 +13336,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -13083,6 +13363,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -13124,6 +13405,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -13155,6 +13437,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -13181,6 +13464,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_mov_b32 s6, s10
; GFX6-NEXT: s_mov_b32 s7, s9
; GFX6-NEXT: v_mov_b32_e32 v0, s8
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_swap v0, off, s[4:7], 0 glc
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
@@ -13197,7 +13481,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s6
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_swap v2, v[0:1], v2 glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -13226,6 +13512,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -13245,6 +13532,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, s6
; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, s5
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s4
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_swap v0, off, s[0:3], 0 glc
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
@@ -13257,6 +13545,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s6, s[8:9], 0x8
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[4:5] glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -13283,6 +13572,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s2, s[4:5], 0x8
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[0:1] sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -13324,6 +13614,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -13355,6 +13646,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: s_load_b32 s2, s[4:5], 0x8
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_swap_b32 v1, v0, v1, s[0:1] th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -13606,6 +13898,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13634,6 +13927,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13648,6 +13942,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -13664,6 +13959,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13686,6 +13982,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13700,6 +13997,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13730,6 +14028,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13760,6 +14059,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -13776,6 +14076,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -13790,6 +14091,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -13806,6 +14108,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -13834,6 +14137,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -13862,6 +14166,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
; GFX7-NEXT: s_endpgm
;
@@ -13892,6 +14197,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -13914,6 +14220,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -13928,6 +14235,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -13957,6 +14265,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -14002,6 +14311,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -14034,6 +14344,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
@@ -14063,7 +14374,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14091,7 +14404,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14108,6 +14423,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -14123,7 +14439,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14145,7 +14463,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14159,7 +14479,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14190,7 +14512,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14224,6 +14548,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -14239,7 +14564,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -14258,6 +14585,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -14273,7 +14601,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14302,7 +14632,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14330,7 +14662,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14347,6 +14681,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -14362,7 +14697,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14384,7 +14721,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14398,7 +14737,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14429,7 +14770,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14463,6 +14806,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -14478,7 +14822,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -14497,6 +14843,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -14512,7 +14859,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14542,6 +14891,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14570,6 +14920,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14584,6 +14935,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -14600,6 +14952,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14622,6 +14975,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14636,6 +14990,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14666,6 +15021,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14696,6 +15052,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -14712,6 +15069,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -14726,6 +15084,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -14742,6 +15101,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14771,6 +15131,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14799,6 +15160,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14813,6 +15175,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -14829,6 +15192,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14851,6 +15215,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14865,6 +15230,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14895,6 +15261,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14925,6 +15292,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -14941,6 +15309,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -14955,6 +15324,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, v3
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -14971,6 +15341,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -14999,7 +15370,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_release_acquire_cmpxchg:
@@ -15027,7 +15400,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_release_acquire_cmpxchg:
@@ -15044,6 +15419,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -15059,7 +15435,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_release_acquire_cmpxchg:
@@ -15081,7 +15459,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_release_acquire_cmpxchg:
@@ -15095,7 +15475,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_release_acquire_cmpxchg:
@@ -15126,7 +15508,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_release_acquire_cmpxchg:
@@ -15160,6 +15544,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -15175,7 +15560,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_release_acquire_cmpxchg:
@@ -15194,6 +15581,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -15209,7 +15597,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15238,7 +15628,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15266,7 +15658,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15283,6 +15677,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -15298,7 +15693,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15320,7 +15717,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15334,7 +15733,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15365,7 +15766,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15399,6 +15802,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -15414,7 +15818,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -15433,6 +15839,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -15448,7 +15855,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15477,7 +15886,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15505,7 +15916,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15522,6 +15935,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -15537,7 +15951,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15559,7 +15975,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15573,7 +15991,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15604,7 +16024,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15638,6 +16060,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -15653,7 +16076,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -15672,6 +16097,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -15687,7 +16113,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15716,7 +16144,9 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15744,7 +16174,9 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15761,6 +16193,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -15776,7 +16209,9 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15798,7 +16233,9 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15812,7 +16249,9 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15843,7 +16282,9 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15877,6 +16318,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -15892,7 +16334,9 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -15911,6 +16355,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -15926,7 +16371,9 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -15955,7 +16402,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -15983,7 +16432,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16000,6 +16451,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -16015,7 +16467,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16037,7 +16491,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16051,7 +16507,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16082,7 +16540,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16116,6 +16576,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -16131,7 +16592,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -16150,6 +16613,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -16165,7 +16629,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -16194,7 +16660,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16222,7 +16690,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16239,6 +16709,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -16254,7 +16725,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16276,7 +16749,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16290,7 +16765,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16321,7 +16798,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16355,6 +16834,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -16370,7 +16850,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -16389,6 +16871,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -16404,7 +16887,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -16433,7 +16918,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16461,7 +16948,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16478,6 +16967,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -16493,7 +16983,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16515,7 +17007,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16529,7 +17023,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16560,7 +17056,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16594,6 +17092,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -16609,7 +17108,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -16628,6 +17129,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -16643,7 +17145,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -16672,7 +17176,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: global_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16700,7 +17206,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v[0:1], v[2:3]
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: global_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16717,6 +17225,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: buffer_gl0_inv
; GFX10-WGP-NEXT: s_endpgm
@@ -16732,7 +17241,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v0, v[1:2], s[4:5] offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: global_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16754,7 +17265,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16768,7 +17281,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[4:5] offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16799,7 +17314,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v[2:3], s[0:1] offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16833,6 +17350,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: buffer_gl0_inv
; GFX11-WGP-NEXT: s_endpgm
@@ -16848,7 +17366,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: global_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -16867,6 +17387,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_loadcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16 scope:SCOPE_SE
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
@@ -16882,7 +17403,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v0, v[1:2], s[0:1] offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %out, i32 %in, i32 %old) {
entry:
@@ -17163,8 +17686,8 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_ret_cmpxchg
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -17194,6 +17717,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -17253,8 +17777,8 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_monotonic_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -17418,6 +17942,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
@@ -17449,6 +17974,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
@@ -17485,6 +18011,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17509,6 +18036,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
@@ -17526,6 +18054,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17559,6 +18088,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -17610,6 +18140,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17646,6 +18177,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_monotonic_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17679,9 +18211,10 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -17710,7 +18243,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -17747,6 +18282,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17771,9 +18307,10 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -17788,6 +18325,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -17822,6 +18360,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -17875,6 +18414,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17914,6 +18454,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -17947,9 +18488,10 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -17978,7 +18520,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18015,6 +18559,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18039,9 +18584,10 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18056,6 +18602,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18090,6 +18637,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18143,6 +18691,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18182,6 +18731,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18216,8 +18766,8 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_ret_cmpxchg
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -18247,6 +18797,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18306,8 +18857,8 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_acquire_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18474,8 +19025,8 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -18505,6 +19056,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18564,8 +19116,8 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18729,9 +19281,10 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -18760,7 +19313,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -18797,6 +19352,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18821,9 +19377,10 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -18838,6 +19395,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -18872,6 +19430,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -18925,6 +19484,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18964,6 +19524,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -18997,9 +19558,10 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -19028,7 +19590,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -19065,6 +19629,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19089,9 +19654,10 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -19106,6 +19672,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19140,6 +19707,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19193,6 +19761,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19232,6 +19801,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19265,9 +19835,10 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -19296,7 +19867,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -19333,6 +19906,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19357,9 +19931,10 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -19374,6 +19949,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19408,6 +19984,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19461,6 +20038,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19500,6 +20078,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19533,9 +20112,10 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -19564,7 +20144,9 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -19601,6 +20183,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19625,9 +20208,10 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -19642,6 +20226,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19676,6 +20261,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19729,6 +20315,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19768,6 +20355,7 @@ define amdgpu_kernel void @global_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -19801,9 +20389,10 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -19832,7 +20421,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -19869,6 +20460,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19893,9 +20485,10 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -19910,6 +20503,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -19944,6 +20538,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -19997,6 +20592,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -20034,6 +20630,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -20067,9 +20664,10 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -20098,7 +20696,9 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -20135,6 +20735,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -20159,9 +20760,10 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -20176,6 +20778,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -20210,6 +20813,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -20263,6 +20867,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -20302,6 +20907,7 @@ define amdgpu_kernel void @global_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -20335,9 +20941,10 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -20366,7 +20973,9 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -20403,6 +21012,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -20427,9 +21037,10 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -20444,6 +21055,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -20478,6 +21090,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -20531,6 +21144,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -20570,6 +21184,7 @@ define amdgpu_kernel void @global_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -20603,9 +21218,10 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v2, s8
; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: v_mov_b32_e32 v1, v2
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: buffer_atomic_cmpswap v[0:1], off, s[4:7], 0 offset:16 glc
-; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: s_waitcnt vmcnt(0)
+; GFX6-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX6-NEXT: s_endpgm
;
@@ -20634,7 +21250,9 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v3, v0
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s7
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt vmcnt(0)
@@ -20671,6 +21289,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v3, s6
; GFX10-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX10-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: global_atomic_cmpswap v1, v0, v[1:2], s[4:5] offset:16 glc
; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
; GFX10-CU-NEXT: global_store_dword v0, v1, s[4:5]
@@ -20695,9 +21314,10 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s4
; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, v2
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: buffer_atomic_cmpswap v[0:1], off, s[0:3], 0 offset:16 glc
-; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
; SKIP-CACHE-INV-NEXT: buffer_store_dword v0, off, s[0:3], 0
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -20712,6 +21332,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[4:5] offset:16 glc
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[4:5]
@@ -20746,6 +21367,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: global_atomic_cmpswap v1, v0, v[2:3], s[0:1] offset:16 sc0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
@@ -20799,6 +21421,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX11-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX11-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 glc
; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
; GFX11-CU-NEXT: global_store_b32 v0, v1, s[0:1]
@@ -20838,6 +21461,7 @@ define amdgpu_kernel void @global_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v3, s2
; GFX12-CU-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GFX12-CU-NEXT: v_mov_b32_e32 v2, v3
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: global_atomic_cmpswap_b32 v1, v0, v[1:2], s[0:1] offset:16 th:TH_ATOMIC_RETURN
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll
index 0467c5047a0be..cc110000538f6 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll
@@ -444,6 +444,7 @@ define amdgpu_kernel void @local_agent_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -469,6 +470,7 @@ define amdgpu_kernel void @local_agent_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -507,7 +509,7 @@ define amdgpu_kernel void @local_agent_acquire_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -520,7 +522,7 @@ define amdgpu_kernel void @local_agent_acquire_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: ds_load_b32 v1, v0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -628,6 +630,7 @@ define amdgpu_kernel void @local_agent_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -655,6 +658,7 @@ define amdgpu_kernel void @local_agent_seq_cst_load(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -700,7 +704,7 @@ define amdgpu_kernel void @local_agent_seq_cst_load(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -712,9 +716,9 @@ define amdgpu_kernel void @local_agent_seq_cst_load(
; GFX12-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_load_b32 v1, v0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -1154,7 +1158,7 @@ define amdgpu_kernel void @local_agent_release_store(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(3) %out) {
@@ -1312,7 +1316,7 @@ define amdgpu_kernel void @local_agent_seq_cst_store(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(3) %out) {
@@ -1541,6 +1545,7 @@ define amdgpu_kernel void @local_agent_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1563,6 +1568,7 @@ define amdgpu_kernel void @local_agent_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1597,7 +1603,7 @@ define amdgpu_kernel void @local_agent_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -1609,7 +1615,7 @@ define amdgpu_kernel void @local_agent_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -1766,7 +1772,7 @@ define amdgpu_kernel void @local_agent_release_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
@@ -1863,6 +1869,7 @@ define amdgpu_kernel void @local_agent_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1887,6 +1894,7 @@ define amdgpu_kernel void @local_agent_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1928,7 +1936,7 @@ define amdgpu_kernel void @local_agent_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -1939,9 +1947,9 @@ define amdgpu_kernel void @local_agent_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -2037,6 +2045,7 @@ define amdgpu_kernel void @local_agent_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -2061,6 +2070,7 @@ define amdgpu_kernel void @local_agent_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -2102,7 +2112,7 @@ define amdgpu_kernel void @local_agent_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -2113,9 +2123,9 @@ define amdgpu_kernel void @local_agent_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -2218,6 +2228,7 @@ define amdgpu_kernel void @local_agent_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2245,6 +2256,7 @@ define amdgpu_kernel void @local_agent_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2286,7 +2298,7 @@ define amdgpu_kernel void @local_agent_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2300,7 +2312,7 @@ define amdgpu_kernel void @local_agent_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -2414,6 +2426,7 @@ define amdgpu_kernel void @local_agent_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2443,6 +2456,7 @@ define amdgpu_kernel void @local_agent_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2491,7 +2505,7 @@ define amdgpu_kernel void @local_agent_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2504,9 +2518,9 @@ define amdgpu_kernel void @local_agent_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -2620,6 +2634,7 @@ define amdgpu_kernel void @local_agent_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2649,6 +2664,7 @@ define amdgpu_kernel void @local_agent_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2697,7 +2713,7 @@ define amdgpu_kernel void @local_agent_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2710,9 +2726,9 @@ define amdgpu_kernel void @local_agent_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -2984,6 +3000,7 @@ define amdgpu_kernel void @local_agent_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3010,6 +3027,7 @@ define amdgpu_kernel void @local_agent_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3050,7 +3068,7 @@ define amdgpu_kernel void @local_agent_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3064,7 +3082,7 @@ define amdgpu_kernel void @local_agent_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3248,7 +3266,7 @@ define amdgpu_kernel void @local_agent_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
@@ -3360,6 +3378,7 @@ define amdgpu_kernel void @local_agent_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3388,6 +3407,7 @@ define amdgpu_kernel void @local_agent_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3435,7 +3455,7 @@ define amdgpu_kernel void @local_agent_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3448,9 +3468,9 @@ define amdgpu_kernel void @local_agent_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3561,6 +3581,7 @@ define amdgpu_kernel void @local_agent_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3589,6 +3610,7 @@ define amdgpu_kernel void @local_agent_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3636,7 +3658,7 @@ define amdgpu_kernel void @local_agent_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3649,9 +3671,9 @@ define amdgpu_kernel void @local_agent_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3754,6 +3776,7 @@ define amdgpu_kernel void @local_agent_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3780,6 +3803,7 @@ define amdgpu_kernel void @local_agent_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3820,7 +3844,7 @@ define amdgpu_kernel void @local_agent_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3834,7 +3858,7 @@ define amdgpu_kernel void @local_agent_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3937,6 +3961,7 @@ define amdgpu_kernel void @local_agent_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3963,6 +3988,7 @@ define amdgpu_kernel void @local_agent_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4003,7 +4029,7 @@ define amdgpu_kernel void @local_agent_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4017,7 +4043,7 @@ define amdgpu_kernel void @local_agent_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4128,6 +4154,7 @@ define amdgpu_kernel void @local_agent_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4156,6 +4183,7 @@ define amdgpu_kernel void @local_agent_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4203,7 +4231,7 @@ define amdgpu_kernel void @local_agent_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4216,9 +4244,9 @@ define amdgpu_kernel void @local_agent_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4329,6 +4357,7 @@ define amdgpu_kernel void @local_agent_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4357,6 +4386,7 @@ define amdgpu_kernel void @local_agent_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4404,7 +4434,7 @@ define amdgpu_kernel void @local_agent_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4417,9 +4447,9 @@ define amdgpu_kernel void @local_agent_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4530,6 +4560,7 @@ define amdgpu_kernel void @local_agent_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4558,6 +4589,7 @@ define amdgpu_kernel void @local_agent_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4605,7 +4637,7 @@ define amdgpu_kernel void @local_agent_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4618,9 +4650,9 @@ define amdgpu_kernel void @local_agent_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4731,6 +4763,7 @@ define amdgpu_kernel void @local_agent_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4759,6 +4792,7 @@ define amdgpu_kernel void @local_agent_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4806,7 +4840,7 @@ define amdgpu_kernel void @local_agent_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4819,9 +4853,9 @@ define amdgpu_kernel void @local_agent_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4932,6 +4966,7 @@ define amdgpu_kernel void @local_agent_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4960,6 +4995,7 @@ define amdgpu_kernel void @local_agent_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5007,7 +5043,7 @@ define amdgpu_kernel void @local_agent_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5020,9 +5056,9 @@ define amdgpu_kernel void @local_agent_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5133,6 +5169,7 @@ define amdgpu_kernel void @local_agent_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -5161,6 +5198,7 @@ define amdgpu_kernel void @local_agent_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5208,7 +5246,7 @@ define amdgpu_kernel void @local_agent_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5221,9 +5259,9 @@ define amdgpu_kernel void @local_agent_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5334,6 +5372,7 @@ define amdgpu_kernel void @local_agent_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -5362,6 +5401,7 @@ define amdgpu_kernel void @local_agent_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5409,7 +5449,7 @@ define amdgpu_kernel void @local_agent_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5422,9 +5462,9 @@ define amdgpu_kernel void @local_agent_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5535,6 +5575,7 @@ define amdgpu_kernel void @local_agent_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -5563,6 +5604,7 @@ define amdgpu_kernel void @local_agent_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5610,7 +5652,7 @@ define amdgpu_kernel void @local_agent_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5623,9 +5665,9 @@ define amdgpu_kernel void @local_agent_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5954,6 +5996,7 @@ define amdgpu_kernel void @local_agent_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -5985,6 +6028,7 @@ define amdgpu_kernel void @local_agent_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6032,7 +6076,7 @@ define amdgpu_kernel void @local_agent_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6048,7 +6092,7 @@ define amdgpu_kernel void @local_agent_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -6275,7 +6319,7 @@ define amdgpu_kernel void @local_agent_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
@@ -6407,6 +6451,7 @@ define amdgpu_kernel void @local_agent_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6440,6 +6485,7 @@ define amdgpu_kernel void @local_agent_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6494,7 +6540,7 @@ define amdgpu_kernel void @local_agent_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6509,9 +6555,9 @@ define amdgpu_kernel void @local_agent_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -6641,6 +6687,7 @@ define amdgpu_kernel void @local_agent_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6674,6 +6721,7 @@ define amdgpu_kernel void @local_agent_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6728,7 +6776,7 @@ define amdgpu_kernel void @local_agent_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6743,9 +6791,9 @@ define amdgpu_kernel void @local_agent_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -6867,6 +6915,7 @@ define amdgpu_kernel void @local_agent_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6898,6 +6947,7 @@ define amdgpu_kernel void @local_agent_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6945,7 +6995,7 @@ define amdgpu_kernel void @local_agent_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6961,7 +7011,7 @@ define amdgpu_kernel void @local_agent_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7083,6 +7133,7 @@ define amdgpu_kernel void @local_agent_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7114,6 +7165,7 @@ define amdgpu_kernel void @local_agent_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7161,7 +7213,7 @@ define amdgpu_kernel void @local_agent_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7177,7 +7229,7 @@ define amdgpu_kernel void @local_agent_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7307,6 +7359,7 @@ define amdgpu_kernel void @local_agent_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7340,6 +7393,7 @@ define amdgpu_kernel void @local_agent_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7394,7 +7448,7 @@ define amdgpu_kernel void @local_agent_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7409,9 +7463,9 @@ define amdgpu_kernel void @local_agent_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7541,6 +7595,7 @@ define amdgpu_kernel void @local_agent_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7574,6 +7629,7 @@ define amdgpu_kernel void @local_agent_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7628,7 +7684,7 @@ define amdgpu_kernel void @local_agent_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7643,9 +7699,9 @@ define amdgpu_kernel void @local_agent_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7775,6 +7831,7 @@ define amdgpu_kernel void @local_agent_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7808,6 +7865,7 @@ define amdgpu_kernel void @local_agent_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7862,7 +7920,7 @@ define amdgpu_kernel void @local_agent_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7877,9 +7935,9 @@ define amdgpu_kernel void @local_agent_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8009,6 +8067,7 @@ define amdgpu_kernel void @local_agent_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8042,6 +8101,7 @@ define amdgpu_kernel void @local_agent_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8096,7 +8156,7 @@ define amdgpu_kernel void @local_agent_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8111,9 +8171,9 @@ define amdgpu_kernel void @local_agent_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8243,6 +8303,7 @@ define amdgpu_kernel void @local_agent_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8276,6 +8337,7 @@ define amdgpu_kernel void @local_agent_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8330,7 +8392,7 @@ define amdgpu_kernel void @local_agent_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8345,9 +8407,9 @@ define amdgpu_kernel void @local_agent_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8477,6 +8539,7 @@ define amdgpu_kernel void @local_agent_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8510,6 +8573,7 @@ define amdgpu_kernel void @local_agent_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8564,7 +8628,7 @@ define amdgpu_kernel void @local_agent_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8579,9 +8643,9 @@ define amdgpu_kernel void @local_agent_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8711,6 +8775,7 @@ define amdgpu_kernel void @local_agent_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8744,6 +8809,7 @@ define amdgpu_kernel void @local_agent_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8798,7 +8864,7 @@ define amdgpu_kernel void @local_agent_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8813,9 +8879,9 @@ define amdgpu_kernel void @local_agent_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8945,6 +9011,7 @@ define amdgpu_kernel void @local_agent_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8978,6 +9045,7 @@ define amdgpu_kernel void @local_agent_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -9032,7 +9100,7 @@ define amdgpu_kernel void @local_agent_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -9047,9 +9115,9 @@ define amdgpu_kernel void @local_agent_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -9415,6 +9483,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -9429,6 +9498,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -9442,6 +9512,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -9454,6 +9525,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -9467,6 +9539,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -9480,6 +9553,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9492,6 +9566,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9504,6 +9579,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9516,6 +9592,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9528,6 +9605,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -9540,6 +9618,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -9552,6 +9631,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -9564,6 +9644,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -9585,7 +9666,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -9599,7 +9682,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -9612,7 +9697,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX10-WGP-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -9624,7 +9711,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX10-CU-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -9637,7 +9726,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -9650,7 +9741,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9662,7 +9755,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9674,7 +9769,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9686,7 +9783,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX942-TGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9698,7 +9797,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX11-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -9710,7 +9811,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX11-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -9722,7 +9825,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX12-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -9734,7 +9839,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_load(
; GFX12-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -10036,6 +10143,7 @@ define amdgpu_kernel void @local_agent_one_as_release_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -10047,6 +10155,7 @@ define amdgpu_kernel void @local_agent_one_as_release_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -10057,6 +10166,7 @@ define amdgpu_kernel void @local_agent_one_as_release_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10067,6 +10177,7 @@ define amdgpu_kernel void @local_agent_one_as_release_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -10078,6 +10189,7 @@ define amdgpu_kernel void @local_agent_one_as_release_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10088,6 +10200,7 @@ define amdgpu_kernel void @local_agent_one_as_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10098,6 +10211,7 @@ define amdgpu_kernel void @local_agent_one_as_release_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -10108,6 +10222,7 @@ define amdgpu_kernel void @local_agent_one_as_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10118,6 +10233,7 @@ define amdgpu_kernel void @local_agent_one_as_release_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -10128,6 +10244,7 @@ define amdgpu_kernel void @local_agent_one_as_release_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -10138,6 +10255,7 @@ define amdgpu_kernel void @local_agent_one_as_release_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -10176,6 +10294,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -10187,6 +10306,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -10197,6 +10317,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10207,6 +10328,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -10218,6 +10340,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10228,6 +10351,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10238,6 +10362,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -10248,6 +10373,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10258,6 +10384,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -10268,6 +10395,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -10278,6 +10406,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -10457,6 +10586,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10468,6 +10598,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10478,6 +10609,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10488,6 +10620,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10499,6 +10632,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10509,6 +10643,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10519,6 +10654,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10529,6 +10665,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10539,6 +10676,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10549,6 +10687,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10559,6 +10698,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10569,6 +10709,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_acquire_atomicrmw:
@@ -10579,6 +10720,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -10596,6 +10738,7 @@ define amdgpu_kernel void @local_agent_one_as_release_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -10607,6 +10750,7 @@ define amdgpu_kernel void @local_agent_one_as_release_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -10617,6 +10761,7 @@ define amdgpu_kernel void @local_agent_one_as_release_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10627,6 +10772,7 @@ define amdgpu_kernel void @local_agent_one_as_release_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -10638,6 +10784,7 @@ define amdgpu_kernel void @local_agent_one_as_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10648,6 +10795,7 @@ define amdgpu_kernel void @local_agent_one_as_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10658,6 +10806,7 @@ define amdgpu_kernel void @local_agent_one_as_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -10668,6 +10817,7 @@ define amdgpu_kernel void @local_agent_one_as_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10678,6 +10828,7 @@ define amdgpu_kernel void @local_agent_one_as_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -10688,6 +10839,7 @@ define amdgpu_kernel void @local_agent_one_as_release_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -10698,6 +10850,7 @@ define amdgpu_kernel void @local_agent_one_as_release_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -10736,7 +10889,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10747,7 +10902,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10757,7 +10914,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10767,7 +10926,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10778,7 +10939,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10788,7 +10951,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10798,7 +10963,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10808,7 +10975,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10818,7 +10987,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10828,7 +10999,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10838,7 +11011,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10849,6 +11024,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_acq_rel_atomicrmw:
@@ -10859,6 +11035,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -10876,7 +11053,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10887,7 +11066,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10897,7 +11078,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10907,7 +11090,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10918,7 +11103,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10928,7 +11115,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10938,7 +11127,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10948,7 +11139,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10958,7 +11151,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10968,7 +11163,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10978,7 +11175,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10989,6 +11188,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_seq_cst_atomicrmw:
@@ -10999,6 +11199,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -11017,6 +11218,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -11032,6 +11234,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -11046,6 +11249,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -11059,6 +11263,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -11073,6 +11278,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -11087,6 +11293,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11100,6 +11307,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11113,6 +11321,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11126,6 +11335,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11139,6 +11349,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -11152,6 +11363,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -11165,6 +11377,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -11178,6 +11391,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -11199,7 +11413,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -11214,7 +11430,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -11228,7 +11446,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -11241,7 +11461,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -11255,7 +11477,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -11269,7 +11493,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11282,7 +11508,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11295,7 +11523,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11308,7 +11538,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11321,7 +11553,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -11334,7 +11568,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -11348,6 +11584,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -11361,6 +11598,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -11382,7 +11620,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -11397,7 +11637,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -11411,7 +11653,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -11424,7 +11668,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -11438,7 +11684,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -11452,7 +11700,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11465,7 +11715,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11478,7 +11730,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11491,7 +11745,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11504,7 +11760,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -11517,7 +11775,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -11531,6 +11791,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -11544,6 +11805,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -11735,6 +11997,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11748,6 +12011,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11760,6 +12024,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11772,6 +12037,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11785,6 +12051,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11797,6 +12064,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11809,6 +12077,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11821,6 +12090,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11833,6 +12103,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11845,6 +12116,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11857,6 +12129,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11869,6 +12142,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
@@ -11881,6 +12155,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11901,6 +12176,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -11914,6 +12190,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX7-NEXT: s_endpgm
;
@@ -11926,6 +12203,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-WGP-NEXT: s_endpgm
;
@@ -11938,6 +12216,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -11951,6 +12230,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11963,6 +12243,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11975,6 +12256,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11987,6 +12269,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11999,6 +12282,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -12011,6 +12295,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -12023,6 +12308,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -12068,7 +12354,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12081,7 +12369,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12093,7 +12383,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12105,7 +12397,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12118,7 +12412,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12130,7 +12426,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12142,7 +12440,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12154,7 +12454,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12166,7 +12468,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12178,7 +12482,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12190,7 +12496,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12203,6 +12511,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
@@ -12215,6 +12524,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12235,7 +12545,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12248,7 +12560,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12260,7 +12574,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12272,7 +12588,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12285,7 +12603,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12297,7 +12617,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12309,7 +12631,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12321,7 +12645,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12333,7 +12659,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12345,7 +12673,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12357,7 +12687,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12370,6 +12702,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
@@ -12382,6 +12715,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12403,6 +12737,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12416,6 +12751,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12428,6 +12764,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12440,6 +12777,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12453,6 +12791,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12465,6 +12804,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12477,6 +12817,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12489,6 +12830,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12501,6 +12843,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12513,6 +12856,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12525,6 +12869,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12537,6 +12882,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
@@ -12549,6 +12895,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12570,6 +12917,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12583,6 +12931,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12595,6 +12944,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12607,6 +12957,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12620,6 +12971,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12632,6 +12984,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12644,6 +12997,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12656,6 +13010,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12668,6 +13023,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12680,6 +13036,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12692,6 +13049,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12704,6 +13062,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
@@ -12716,6 +13075,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12736,7 +13096,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12749,7 +13111,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12761,7 +13125,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12773,7 +13139,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12786,7 +13154,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12798,7 +13168,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12810,7 +13182,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12822,7 +13196,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12834,7 +13210,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12846,7 +13224,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12858,7 +13238,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12871,6 +13253,7 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_release_acquire_cmpxchg:
@@ -12883,6 +13266,7 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12903,7 +13287,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -12916,7 +13302,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -12928,7 +13316,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -12940,7 +13330,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -12953,7 +13345,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -12965,7 +13359,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -12977,7 +13373,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -12989,7 +13387,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -13001,7 +13401,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -13013,7 +13415,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -13025,7 +13429,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -13038,6 +13444,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
@@ -13050,6 +13457,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13070,7 +13478,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13083,7 +13493,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13095,7 +13507,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13107,7 +13521,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13120,7 +13536,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13132,7 +13550,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13144,7 +13564,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13156,7 +13578,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13168,7 +13592,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13180,7 +13606,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13192,7 +13620,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13205,6 +13635,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
@@ -13217,6 +13648,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13237,7 +13669,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13250,7 +13684,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13262,7 +13698,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13274,7 +13712,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13287,7 +13727,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13299,7 +13741,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13311,7 +13755,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13323,7 +13769,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13335,7 +13783,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13347,7 +13797,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13359,7 +13811,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13372,6 +13826,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
@@ -13384,6 +13839,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13404,7 +13860,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13417,7 +13875,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13429,7 +13889,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13441,7 +13903,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13454,7 +13918,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13466,7 +13932,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13478,7 +13946,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13490,7 +13960,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13502,7 +13974,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13514,7 +13988,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13526,7 +14002,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13539,6 +14017,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
@@ -13551,6 +14030,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13571,7 +14051,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13584,7 +14066,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13596,7 +14080,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13608,7 +14094,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13621,7 +14109,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13633,7 +14123,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13645,7 +14137,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13657,7 +14151,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13669,7 +14165,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13681,7 +14179,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13693,7 +14193,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13706,6 +14208,7 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
@@ -13718,6 +14221,7 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13738,7 +14242,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13751,7 +14257,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13763,7 +14271,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13775,7 +14285,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13788,7 +14300,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13800,7 +14314,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13812,7 +14328,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13824,7 +14342,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13836,7 +14356,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13848,7 +14370,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13860,7 +14384,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13873,6 +14399,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13885,6 +14412,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13905,7 +14433,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13918,7 +14448,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13930,7 +14462,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13942,7 +14476,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13955,7 +14491,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13967,7 +14505,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13979,7 +14519,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13991,7 +14533,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14003,7 +14547,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14015,7 +14561,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14027,7 +14575,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14040,6 +14590,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14052,6 +14603,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -14284,6 +14836,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14301,6 +14854,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14317,6 +14871,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14332,6 +14887,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14348,6 +14904,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14364,6 +14921,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14379,6 +14937,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14394,6 +14953,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14409,6 +14969,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14424,6 +14985,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14439,6 +15001,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14454,6 +15017,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14469,6 +15033,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14494,6 +15059,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
@@ -14511,6 +15077,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
@@ -14527,6 +15094,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -14542,6 +15110,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -14558,6 +15127,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
@@ -14574,6 +15144,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14589,6 +15160,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14604,6 +15176,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14619,6 +15192,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14634,6 +15208,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -14649,6 +15224,7 @@ define amdgpu_kernel void @local_agent_one_as_release_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -14705,7 +15281,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14722,7 +15300,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14738,7 +15318,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14753,7 +15335,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14769,7 +15353,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14785,7 +15371,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14800,7 +15388,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14815,7 +15405,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14830,7 +15422,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14845,7 +15439,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14860,7 +15456,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14876,6 +15474,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14891,6 +15490,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14916,7 +15516,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14933,7 +15535,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14949,7 +15553,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14964,7 +15570,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14980,7 +15588,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14996,7 +15606,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15011,7 +15623,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15026,7 +15640,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15041,7 +15657,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15056,7 +15674,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15071,7 +15691,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15087,6 +15709,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15102,6 +15725,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15128,6 +15752,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15145,6 +15770,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15161,6 +15787,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15176,6 +15803,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15192,6 +15820,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15208,6 +15837,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15223,6 +15853,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15238,6 +15869,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15253,6 +15885,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15268,6 +15901,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15283,6 +15917,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15298,6 +15933,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15313,6 +15949,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15339,6 +15976,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15356,6 +15994,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15372,6 +16011,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15387,6 +16027,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15403,6 +16044,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15419,6 +16061,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15434,6 +16077,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15449,6 +16093,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15464,6 +16109,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15479,6 +16125,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15494,6 +16141,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15509,6 +16157,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15524,6 +16173,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15549,7 +16199,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15566,7 +16218,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15582,7 +16236,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15597,7 +16253,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15613,7 +16271,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15629,7 +16289,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15644,7 +16306,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15659,7 +16323,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15674,7 +16340,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15689,7 +16357,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15704,7 +16374,9 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15720,6 +16392,7 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15735,6 +16408,7 @@ define amdgpu_kernel void @local_agent_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15760,7 +16434,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15777,7 +16453,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15793,7 +16471,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15808,7 +16488,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15824,7 +16506,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15840,7 +16524,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15855,7 +16541,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15870,7 +16558,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15885,7 +16575,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15900,7 +16592,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15915,7 +16609,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15931,6 +16627,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15946,6 +16643,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15971,7 +16669,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15988,7 +16688,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16004,7 +16706,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16019,7 +16723,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16035,7 +16741,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16051,7 +16759,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16066,7 +16776,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16081,7 +16793,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16096,7 +16810,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16111,7 +16827,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16126,7 +16844,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16142,6 +16862,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16157,6 +16878,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16182,7 +16904,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16199,7 +16923,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16215,7 +16941,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16230,7 +16958,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16246,7 +16976,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16262,7 +16994,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16277,7 +17011,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16292,7 +17028,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16307,7 +17045,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16322,7 +17062,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16337,7 +17079,9 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16353,6 +17097,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16368,6 +17113,7 @@ define amdgpu_kernel void @local_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16393,7 +17139,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16410,7 +17158,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16426,7 +17176,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16441,7 +17193,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16457,7 +17211,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16473,7 +17229,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16488,7 +17246,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16503,7 +17263,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16518,7 +17280,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16533,7 +17297,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16548,7 +17314,9 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16564,6 +17332,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16579,6 +17348,7 @@ define amdgpu_kernel void @local_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16604,7 +17374,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16621,7 +17393,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16637,7 +17411,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16652,7 +17428,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16668,7 +17446,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16684,7 +17464,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16699,7 +17481,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16714,7 +17498,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16729,7 +17515,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16744,7 +17532,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16759,7 +17549,9 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16775,6 +17567,7 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16790,6 +17583,7 @@ define amdgpu_kernel void @local_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16815,7 +17609,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16832,7 +17628,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16848,7 +17646,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16863,7 +17663,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16879,7 +17681,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16895,7 +17699,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16910,7 +17716,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16925,7 +17733,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16940,7 +17750,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16955,7 +17767,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16970,7 +17784,9 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16986,6 +17802,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -17001,6 +17818,7 @@ define amdgpu_kernel void @local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -17026,7 +17844,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -17043,7 +17863,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -17059,7 +17881,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -17074,7 +17898,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -17090,7 +17916,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -17106,7 +17934,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17121,7 +17951,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17136,7 +17968,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17151,7 +17985,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17166,7 +18002,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -17181,7 +18019,9 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -17197,6 +18037,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -17212,6 +18053,7 @@ define amdgpu_kernel void @local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-nontemporal.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-nontemporal.ll
index 78209ee34cad4..b2489f612ee62 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-nontemporal.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-nontemporal.ll
@@ -845,6 +845,7 @@ define amdgpu_kernel void @local_nontemporal_volatile_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: ds_read_b32 v2, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -976,7 +977,7 @@ define amdgpu_kernel void @local_nontemporal_volatile_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: ds_load_b32 v1, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX12-WGP-NEXT: s_endpgm
;
@@ -988,7 +989,7 @@ define amdgpu_kernel void @local_nontemporal_volatile_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: ds_load_b32 v1, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %in, ptr addrspace(1) %out) {
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-singlethread.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-singlethread.ll
index f84d451f8ecb0..8aee1e8432ee1 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-singlethread.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-singlethread.ll
@@ -366,6 +366,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -380,6 +381,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -393,6 +395,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -405,6 +408,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -418,6 +422,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -431,6 +436,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -443,6 +449,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -455,6 +462,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -467,6 +475,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -479,6 +488,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -491,6 +501,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -503,6 +514,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -515,6 +527,7 @@ define amdgpu_kernel void @local_singlethread_acquire_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -536,7 +549,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -550,7 +565,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -563,7 +580,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX10-WGP-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -575,7 +594,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX10-CU-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -588,7 +609,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -601,7 +624,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -613,7 +638,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -625,7 +652,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -637,7 +666,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX942-TGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -649,7 +680,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX11-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -661,7 +694,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX11-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -673,7 +708,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX12-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -685,7 +722,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_load(
; GFX12-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -987,6 +1026,7 @@ define amdgpu_kernel void @local_singlethread_release_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -998,6 +1038,7 @@ define amdgpu_kernel void @local_singlethread_release_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -1008,6 +1049,7 @@ define amdgpu_kernel void @local_singlethread_release_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1018,6 +1060,7 @@ define amdgpu_kernel void @local_singlethread_release_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -1029,6 +1072,7 @@ define amdgpu_kernel void @local_singlethread_release_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1039,6 +1083,7 @@ define amdgpu_kernel void @local_singlethread_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1049,6 +1094,7 @@ define amdgpu_kernel void @local_singlethread_release_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1059,6 +1105,7 @@ define amdgpu_kernel void @local_singlethread_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1069,6 +1116,7 @@ define amdgpu_kernel void @local_singlethread_release_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1079,6 +1127,7 @@ define amdgpu_kernel void @local_singlethread_release_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1089,6 +1138,7 @@ define amdgpu_kernel void @local_singlethread_release_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -1127,6 +1177,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -1138,6 +1189,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -1148,6 +1200,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1158,6 +1211,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -1169,6 +1223,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1179,6 +1234,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1189,6 +1245,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1199,6 +1256,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1209,6 +1267,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1219,6 +1278,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1229,6 +1289,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -1408,6 +1469,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1419,6 +1481,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1429,6 +1492,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1439,6 +1503,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1450,6 +1515,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1460,6 +1526,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1470,6 +1537,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1480,6 +1548,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1490,6 +1559,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1500,6 +1570,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1510,6 +1581,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1520,6 +1592,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_acquire_atomicrmw:
@@ -1530,6 +1603,7 @@ define amdgpu_kernel void @local_singlethread_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -1547,6 +1621,7 @@ define amdgpu_kernel void @local_singlethread_release_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -1558,6 +1633,7 @@ define amdgpu_kernel void @local_singlethread_release_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -1568,6 +1644,7 @@ define amdgpu_kernel void @local_singlethread_release_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1578,6 +1655,7 @@ define amdgpu_kernel void @local_singlethread_release_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -1589,6 +1667,7 @@ define amdgpu_kernel void @local_singlethread_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1599,6 +1678,7 @@ define amdgpu_kernel void @local_singlethread_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1609,6 +1689,7 @@ define amdgpu_kernel void @local_singlethread_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1619,6 +1700,7 @@ define amdgpu_kernel void @local_singlethread_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1629,6 +1711,7 @@ define amdgpu_kernel void @local_singlethread_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1639,6 +1722,7 @@ define amdgpu_kernel void @local_singlethread_release_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1649,6 +1733,7 @@ define amdgpu_kernel void @local_singlethread_release_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -1687,7 +1772,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1698,7 +1785,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1708,7 +1797,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1718,7 +1809,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1729,7 +1822,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1739,7 +1834,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1749,7 +1846,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1759,7 +1858,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1769,7 +1870,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1779,7 +1882,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1789,7 +1894,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1800,6 +1907,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_acq_rel_atomicrmw:
@@ -1810,6 +1918,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -1827,7 +1936,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1838,7 +1949,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1848,7 +1961,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1858,7 +1973,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1869,7 +1986,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1879,7 +1998,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1889,7 +2010,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1899,7 +2022,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1909,7 +2034,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1919,7 +2046,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1929,7 +2058,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1940,6 +2071,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_seq_cst_atomicrmw:
@@ -1950,6 +2082,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -1968,6 +2101,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -1983,6 +2117,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -1997,6 +2132,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -2010,6 +2146,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -2024,6 +2161,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -2038,6 +2176,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2051,6 +2190,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2064,6 +2204,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2077,6 +2218,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2090,6 +2232,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -2103,6 +2246,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -2116,6 +2260,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2129,6 +2274,7 @@ define amdgpu_kernel void @local_singlethread_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -2150,7 +2296,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -2165,7 +2313,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -2179,7 +2329,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -2192,7 +2344,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -2206,7 +2360,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -2220,7 +2376,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2233,7 +2391,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2246,7 +2406,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2259,7 +2421,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2272,7 +2436,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -2285,7 +2451,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -2299,6 +2467,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2312,6 +2481,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -2333,7 +2503,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -2348,7 +2520,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -2362,7 +2536,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -2375,7 +2551,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -2389,7 +2567,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -2403,7 +2583,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2416,7 +2598,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2429,7 +2613,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2442,7 +2628,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2455,7 +2643,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -2468,7 +2658,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -2482,6 +2674,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2495,6 +2688,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -2686,6 +2880,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2699,6 +2894,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2711,6 +2907,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2723,6 +2920,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2736,6 +2934,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2748,6 +2947,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2760,6 +2960,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2772,6 +2973,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2784,6 +2986,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2796,6 +2999,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2808,6 +3012,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2820,6 +3025,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
@@ -2832,6 +3038,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -2852,6 +3059,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -2865,6 +3073,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX7-NEXT: s_endpgm
;
@@ -2877,6 +3086,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-WGP-NEXT: s_endpgm
;
@@ -2889,6 +3099,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -2902,6 +3113,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -2914,6 +3126,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -2926,6 +3139,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -2938,6 +3152,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -2950,6 +3165,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -2962,6 +3178,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -2974,6 +3191,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -3019,7 +3237,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3032,7 +3252,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3044,7 +3266,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3056,7 +3280,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3069,7 +3295,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3081,7 +3309,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3093,7 +3323,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3105,7 +3337,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3117,7 +3351,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3129,7 +3365,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3141,7 +3379,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3154,6 +3394,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
@@ -3166,6 +3407,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3186,7 +3428,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3199,7 +3443,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3211,7 +3457,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3223,7 +3471,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3236,7 +3486,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3248,7 +3500,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3260,7 +3514,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3272,7 +3528,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3284,7 +3542,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3296,7 +3556,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3308,7 +3570,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3321,6 +3585,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
@@ -3333,6 +3598,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3354,6 +3620,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3367,6 +3634,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3379,6 +3647,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3391,6 +3660,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3404,6 +3674,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3416,6 +3687,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3428,6 +3700,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3440,6 +3713,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3452,6 +3726,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3464,6 +3739,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3476,6 +3752,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3488,6 +3765,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
@@ -3500,6 +3778,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3521,6 +3800,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3534,6 +3814,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3546,6 +3827,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3558,6 +3840,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3571,6 +3854,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3583,6 +3867,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3595,6 +3880,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3607,6 +3893,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3619,6 +3906,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3631,6 +3919,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3643,6 +3932,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3655,6 +3945,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_acquire_acquire_cmpxchg:
@@ -3667,6 +3958,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3687,7 +3979,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3700,7 +3994,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3712,7 +4008,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3724,7 +4022,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3737,7 +4037,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3749,7 +4051,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3761,7 +4065,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3773,7 +4079,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3785,7 +4093,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3797,7 +4107,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3809,7 +4121,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3822,6 +4136,7 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_release_acquire_cmpxchg:
@@ -3834,6 +4149,7 @@ define amdgpu_kernel void @local_singlethread_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3854,7 +4170,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -3867,7 +4185,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -3879,7 +4199,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -3891,7 +4213,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -3904,7 +4228,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -3916,7 +4242,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -3928,7 +4256,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -3940,7 +4270,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -3952,7 +4284,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -3964,7 +4298,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -3976,7 +4312,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -3989,6 +4327,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
@@ -4001,6 +4340,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4021,7 +4361,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4034,7 +4376,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4046,7 +4390,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4058,7 +4404,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4071,7 +4419,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4083,7 +4433,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4095,7 +4447,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4107,7 +4461,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4119,7 +4475,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4131,7 +4489,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4143,7 +4503,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4156,6 +4518,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
@@ -4168,6 +4531,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4188,7 +4552,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4201,7 +4567,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4213,7 +4581,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4225,7 +4595,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4238,7 +4610,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4250,7 +4624,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4262,7 +4638,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4274,7 +4652,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4286,7 +4666,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4298,7 +4680,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4310,7 +4694,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4323,6 +4709,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
@@ -4335,6 +4722,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4355,7 +4743,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4368,7 +4758,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4380,7 +4772,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4392,7 +4786,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4405,7 +4801,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4417,7 +4815,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4429,7 +4829,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4441,7 +4843,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4453,7 +4857,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4465,7 +4871,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4477,7 +4885,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4490,6 +4900,7 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
@@ -4502,6 +4913,7 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4522,7 +4934,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4535,7 +4949,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4547,7 +4963,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4559,7 +4977,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4572,7 +4992,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4584,7 +5006,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4596,7 +5020,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4608,7 +5034,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4620,7 +5048,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4632,7 +5062,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4644,7 +5076,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4657,6 +5091,7 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_release_seq_cst_cmpxchg:
@@ -4669,6 +5104,7 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4689,7 +5125,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4702,7 +5140,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4714,7 +5154,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4726,7 +5168,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4739,7 +5183,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4751,7 +5197,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4763,7 +5211,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4775,7 +5225,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4787,7 +5239,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4799,7 +5253,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4811,7 +5267,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4824,6 +5282,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
@@ -4836,6 +5295,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4856,7 +5316,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -4869,7 +5331,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -4881,7 +5345,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -4893,7 +5359,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -4906,7 +5374,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -4918,7 +5388,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -4930,7 +5402,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -4942,7 +5416,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -4954,7 +5430,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -4966,7 +5444,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -4978,7 +5458,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -4991,6 +5473,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
@@ -5003,6 +5486,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5235,6 +5719,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -5252,6 +5737,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -5268,6 +5754,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -5283,6 +5770,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -5299,6 +5787,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -5315,6 +5804,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5330,6 +5820,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5345,6 +5836,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5360,6 +5852,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5375,6 +5868,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -5390,6 +5884,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -5405,6 +5900,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -5420,6 +5916,7 @@ define amdgpu_kernel void @local_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -5445,6 +5942,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
@@ -5462,6 +5960,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
@@ -5478,6 +5977,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -5493,6 +5993,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -5509,6 +6010,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
@@ -5525,6 +6027,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -5540,6 +6043,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -5555,6 +6059,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -5570,6 +6075,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -5585,6 +6091,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -5600,6 +6107,7 @@ define amdgpu_kernel void @local_singlethread_release_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -5656,7 +6164,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -5673,7 +6183,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -5689,7 +6201,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -5704,7 +6218,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -5720,7 +6236,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -5736,7 +6254,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5751,7 +6271,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5766,7 +6288,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5781,7 +6305,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5796,7 +6322,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -5811,7 +6339,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -5827,6 +6357,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -5842,6 +6373,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -5867,7 +6399,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -5884,7 +6418,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -5900,7 +6436,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -5915,7 +6453,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -5931,7 +6471,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -5947,7 +6489,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5962,7 +6506,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5977,7 +6523,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5992,7 +6540,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6007,7 +6557,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -6022,7 +6574,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -6038,6 +6592,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6053,6 +6608,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -6079,6 +6635,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -6096,6 +6653,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -6112,6 +6670,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -6127,6 +6686,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -6143,6 +6703,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -6159,6 +6720,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6174,6 +6736,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6189,6 +6752,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6204,6 +6768,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6219,6 +6784,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -6234,6 +6800,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -6249,6 +6816,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6264,6 +6832,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -6290,6 +6859,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -6307,6 +6877,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -6323,6 +6894,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -6338,6 +6910,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -6354,6 +6927,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -6370,6 +6944,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6385,6 +6960,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6400,6 +6976,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6415,6 +6992,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6430,6 +7008,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -6445,6 +7024,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -6460,6 +7040,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6475,6 +7056,7 @@ define amdgpu_kernel void @local_singlethread_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -6500,7 +7082,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -6517,7 +7101,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -6533,7 +7119,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -6548,7 +7136,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -6564,7 +7154,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -6580,7 +7172,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6595,7 +7189,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6610,7 +7206,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6625,7 +7223,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6640,7 +7240,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -6655,7 +7257,9 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -6671,6 +7275,7 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6686,6 +7291,7 @@ define amdgpu_kernel void @local_singlethread_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -6711,7 +7317,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -6728,7 +7336,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -6744,7 +7354,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -6759,7 +7371,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -6775,7 +7389,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -6791,7 +7407,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6806,7 +7424,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6821,7 +7441,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6836,7 +7458,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6851,7 +7475,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -6866,7 +7492,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -6882,6 +7510,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6897,6 +7526,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -6922,7 +7552,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -6939,7 +7571,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -6955,7 +7589,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -6970,7 +7606,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -6986,7 +7624,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -7002,7 +7642,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7017,7 +7659,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7032,7 +7676,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7047,7 +7693,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7062,7 +7710,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -7077,7 +7727,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -7093,6 +7745,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7108,6 +7761,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -7133,7 +7787,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -7150,7 +7806,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -7166,7 +7824,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -7181,7 +7841,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -7197,7 +7859,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -7213,7 +7877,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7228,7 +7894,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7243,7 +7911,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7258,7 +7928,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7273,7 +7945,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -7288,7 +7962,9 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -7304,6 +7980,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7319,6 +7996,7 @@ define amdgpu_kernel void @local_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -7344,7 +8022,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -7361,7 +8041,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -7377,7 +8059,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -7392,7 +8076,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -7408,7 +8094,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -7424,7 +8112,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7439,7 +8129,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7454,7 +8146,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7469,7 +8163,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7484,7 +8180,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -7499,7 +8197,9 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -7515,6 +8215,7 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7530,6 +8231,7 @@ define amdgpu_kernel void @local_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -7555,7 +8257,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -7572,7 +8276,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -7588,7 +8294,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -7603,7 +8311,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -7619,7 +8329,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -7635,7 +8347,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7650,7 +8364,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7665,7 +8381,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7680,7 +8398,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7695,7 +8415,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -7710,7 +8432,9 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -7726,6 +8450,7 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7741,6 +8466,7 @@ define amdgpu_kernel void @local_singlethread_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -7766,7 +8492,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -7783,7 +8511,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -7799,7 +8529,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -7814,7 +8546,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -7830,7 +8564,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -7846,7 +8582,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7861,7 +8599,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7876,7 +8616,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7891,7 +8633,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7906,7 +8650,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -7921,7 +8667,9 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -7937,6 +8685,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7952,6 +8701,7 @@ define amdgpu_kernel void @local_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -7977,7 +8727,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -7994,7 +8746,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -8010,7 +8764,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -8025,7 +8781,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -8041,7 +8799,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -8057,7 +8817,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8072,7 +8834,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8087,7 +8851,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8102,7 +8868,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8117,7 +8885,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -8132,7 +8902,9 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -8148,6 +8920,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8163,6 +8936,7 @@ define amdgpu_kernel void @local_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -8529,6 +9303,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -8543,6 +9318,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -8556,6 +9332,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -8568,6 +9345,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -8581,6 +9359,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -8594,6 +9373,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8606,6 +9386,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8618,6 +9399,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8630,6 +9412,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8642,6 +9425,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -8654,6 +9438,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -8666,6 +9451,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8678,6 +9464,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -8699,7 +9486,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -8713,7 +9502,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -8726,7 +9517,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX10-WGP-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -8738,7 +9531,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX10-CU-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -8751,7 +9546,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -8764,7 +9561,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8776,7 +9575,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8788,7 +9589,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8800,7 +9603,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX942-TGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8812,7 +9617,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX11-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -8824,7 +9631,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX11-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -8836,7 +9645,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX12-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8848,7 +9659,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_load(
; GFX12-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -9150,6 +9963,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -9161,6 +9975,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -9171,6 +9986,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -9181,6 +9997,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -9192,6 +10009,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9202,6 +10020,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9212,6 +10031,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -9222,6 +10042,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9232,6 +10053,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -9242,6 +10064,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -9252,6 +10075,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -9290,6 +10114,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -9301,6 +10126,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -9311,6 +10137,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -9321,6 +10148,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -9332,6 +10160,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9342,6 +10171,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9352,6 +10182,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -9362,6 +10193,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9372,6 +10204,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -9382,6 +10215,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -9392,6 +10226,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -9571,6 +10406,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9582,6 +10418,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9592,6 +10429,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9602,6 +10440,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9613,6 +10452,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9623,6 +10463,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9633,6 +10474,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9643,6 +10485,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9653,6 +10496,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9663,6 +10507,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9673,6 +10518,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9683,6 +10529,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_acquire_atomicrmw:
@@ -9693,6 +10540,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -9710,6 +10558,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -9721,6 +10570,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -9731,6 +10581,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -9741,6 +10592,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -9752,6 +10604,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9762,6 +10615,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9772,6 +10626,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -9782,6 +10637,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9792,6 +10648,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -9802,6 +10659,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -9812,6 +10670,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -9850,7 +10709,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9861,7 +10722,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9871,7 +10734,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9881,7 +10746,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9892,7 +10759,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9902,7 +10771,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9912,7 +10783,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9922,7 +10795,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9932,7 +10807,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9942,7 +10819,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9952,7 +10831,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9963,6 +10844,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
@@ -9973,6 +10855,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -9990,7 +10873,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10001,7 +10886,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10011,7 +10898,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10021,7 +10910,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10032,7 +10923,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10042,7 +10935,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10052,7 +10947,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10062,7 +10959,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10072,7 +10971,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10082,7 +10983,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10092,7 +10995,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10103,6 +11008,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
@@ -10113,6 +11019,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -10131,6 +11038,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -10146,6 +11054,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -10160,6 +11069,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -10173,6 +11083,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -10187,6 +11098,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -10201,6 +11113,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10214,6 +11127,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10227,6 +11141,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10240,6 +11155,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10253,6 +11169,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -10266,6 +11183,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -10279,6 +11197,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -10292,6 +11211,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -10313,7 +11233,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -10328,7 +11250,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -10342,7 +11266,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -10355,7 +11281,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -10369,7 +11297,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -10383,7 +11313,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10396,7 +11328,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10409,7 +11343,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10422,7 +11358,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10435,7 +11373,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -10448,7 +11388,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -10462,6 +11404,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -10475,6 +11418,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -10496,7 +11440,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -10511,7 +11457,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -10525,7 +11473,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -10538,7 +11488,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -10552,7 +11504,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -10566,7 +11520,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10579,7 +11535,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10592,7 +11550,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10605,7 +11565,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10618,7 +11580,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -10631,7 +11595,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -10645,6 +11611,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -10658,6 +11625,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -10849,6 +11817,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10862,6 +11831,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10874,6 +11844,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10886,6 +11857,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10899,6 +11871,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10911,6 +11884,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10923,6 +11897,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10935,6 +11910,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10947,6 +11923,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10959,6 +11936,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10971,6 +11949,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10983,6 +11962,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
@@ -10995,6 +11975,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11015,6 +11996,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -11028,6 +12010,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX7-NEXT: s_endpgm
;
@@ -11040,6 +12023,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-WGP-NEXT: s_endpgm
;
@@ -11052,6 +12036,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -11065,6 +12050,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11077,6 +12063,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11089,6 +12076,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11101,6 +12089,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11113,6 +12102,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -11125,6 +12115,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -11137,6 +12128,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -11182,7 +12174,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11195,7 +12189,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11207,7 +12203,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11219,7 +12217,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11232,7 +12232,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11244,7 +12246,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11256,7 +12260,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11268,7 +12274,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11280,7 +12288,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11292,7 +12302,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11304,7 +12316,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11317,6 +12331,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
@@ -11329,6 +12344,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11349,7 +12365,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11362,7 +12380,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11374,7 +12394,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11386,7 +12408,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11399,7 +12423,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11411,7 +12437,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11423,7 +12451,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11435,7 +12465,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11447,7 +12479,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11459,7 +12493,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11471,7 +12507,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11484,6 +12522,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
@@ -11496,6 +12535,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11517,6 +12557,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11530,6 +12571,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11542,6 +12584,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11554,6 +12597,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11567,6 +12611,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11579,6 +12624,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11591,6 +12637,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11603,6 +12650,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11615,6 +12663,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11627,6 +12676,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11639,6 +12689,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11651,6 +12702,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
@@ -11663,6 +12715,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11684,6 +12737,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11697,6 +12751,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11709,6 +12764,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11721,6 +12777,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11734,6 +12791,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11746,6 +12804,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11758,6 +12817,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11770,6 +12830,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11782,6 +12843,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11794,6 +12856,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11806,6 +12869,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11818,6 +12882,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
@@ -11830,6 +12895,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11850,7 +12916,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11863,7 +12931,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11875,7 +12945,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11887,7 +12959,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11900,7 +12974,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11912,7 +12988,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11924,7 +13002,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11936,7 +13016,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11948,7 +13030,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11960,7 +13044,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11972,7 +13058,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11985,6 +13073,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
@@ -11997,6 +13086,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12017,7 +13107,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12030,7 +13122,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12042,7 +13136,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12054,7 +13150,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12067,7 +13165,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12079,7 +13179,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12091,7 +13193,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12103,7 +13207,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12115,7 +13221,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12127,7 +13235,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12139,7 +13249,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12152,6 +13264,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
@@ -12164,6 +13277,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12184,7 +13298,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12197,7 +13313,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12209,7 +13327,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12221,7 +13341,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12234,7 +13356,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12246,7 +13370,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12258,7 +13384,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12270,7 +13398,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12282,7 +13412,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12294,7 +13426,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12306,7 +13440,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12319,6 +13455,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
@@ -12331,6 +13468,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12351,7 +13489,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12364,7 +13504,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12376,7 +13518,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12388,7 +13532,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12401,7 +13547,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12413,7 +13561,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12425,7 +13575,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12437,7 +13589,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12449,7 +13603,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12461,7 +13617,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12473,7 +13631,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12486,6 +13646,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
@@ -12498,6 +13659,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12518,7 +13680,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12531,7 +13695,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12543,7 +13709,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12555,7 +13723,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12568,7 +13738,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12580,7 +13752,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12592,7 +13766,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12604,7 +13780,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12616,7 +13794,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12628,7 +13808,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12640,7 +13822,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12653,6 +13837,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
@@ -12665,6 +13850,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12685,7 +13871,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12698,7 +13886,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12710,7 +13900,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12722,7 +13914,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12735,7 +13929,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12747,7 +13943,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12759,7 +13957,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12771,7 +13971,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12783,7 +13985,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12795,7 +13999,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12807,7 +14013,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12820,6 +14028,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
@@ -12832,6 +14041,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12852,7 +14062,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12865,7 +14077,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12877,7 +14091,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12889,7 +14105,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12902,7 +14120,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12914,7 +14134,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12926,7 +14148,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12938,7 +14162,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12950,7 +14176,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12962,7 +14190,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12974,7 +14204,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12987,6 +14219,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12999,6 +14232,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13019,7 +14253,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13032,7 +14268,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13044,7 +14282,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13056,7 +14296,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13069,7 +14311,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13081,7 +14325,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13093,7 +14339,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13105,7 +14353,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13117,7 +14367,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13129,7 +14381,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13141,7 +14395,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13154,6 +14410,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13166,6 +14423,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13398,6 +14656,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -13415,6 +14674,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -13431,6 +14691,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -13446,6 +14707,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -13462,6 +14724,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -13478,6 +14741,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13493,6 +14757,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13508,6 +14773,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13523,6 +14789,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13538,6 +14805,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -13553,6 +14821,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -13568,6 +14837,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -13583,6 +14853,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_monotonic_ret_cmpxc
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -13608,6 +14879,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_ret_cmpxc
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
@@ -13625,6 +14897,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_ret_cmpxc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
@@ -13641,6 +14914,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_ret_cmpxc
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -13656,6 +14930,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_ret_cmpxc
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -13672,6 +14947,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_ret_cmpxc
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
@@ -13688,6 +14964,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_ret_cmpxc
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -13703,6 +14980,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_ret_cmpxc
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -13718,6 +14996,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_ret_cmpxc
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -13733,6 +15012,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_ret_cmpxc
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -13748,6 +15028,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_ret_cmpxc
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -13763,6 +15044,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_monotonic_ret_cmpxc
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -13819,7 +15101,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -13836,7 +15120,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -13852,7 +15138,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -13867,7 +15155,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -13883,7 +15173,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -13899,7 +15191,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13914,7 +15208,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13929,7 +15225,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13944,7 +15242,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13959,7 +15259,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -13974,7 +15276,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -13990,6 +15294,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14005,6 +15310,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_monotonic_ret_cmpxc
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14030,7 +15336,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14047,7 +15355,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14063,7 +15373,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14078,7 +15390,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14094,7 +15408,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14110,7 +15426,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14125,7 +15443,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14140,7 +15460,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14155,7 +15477,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14170,7 +15494,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14185,7 +15511,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14201,6 +15529,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14216,6 +15545,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_monotonic_ret_cmpxc
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14242,6 +15572,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14259,6 +15590,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14275,6 +15607,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14290,6 +15623,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14306,6 +15640,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14322,6 +15657,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14337,6 +15673,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14352,6 +15689,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14367,6 +15705,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14382,6 +15721,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14397,6 +15737,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14412,6 +15753,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14427,6 +15769,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_acquire_ret_cmpxc
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14453,6 +15796,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14470,6 +15814,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14486,6 +15831,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14501,6 +15847,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14517,6 +15864,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14533,6 +15881,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14548,6 +15897,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14563,6 +15913,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14578,6 +15929,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14593,6 +15945,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14608,6 +15961,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14623,6 +15977,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14638,6 +15993,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_acquire_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14663,7 +16019,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14680,7 +16038,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14696,7 +16056,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14711,7 +16073,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14727,7 +16091,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14743,7 +16109,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14758,7 +16126,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14773,7 +16143,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14788,7 +16160,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14803,7 +16177,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14818,7 +16194,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14834,6 +16212,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14849,6 +16228,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_acquire_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14874,7 +16254,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14891,7 +16273,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14907,7 +16291,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14922,7 +16308,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14938,7 +16326,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14954,7 +16344,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14969,7 +16361,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14984,7 +16378,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14999,7 +16395,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15014,7 +16412,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15029,7 +16429,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15045,6 +16447,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15060,6 +16463,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15085,7 +16489,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15102,7 +16508,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15118,7 +16526,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15133,7 +16543,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15149,7 +16561,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15165,7 +16579,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15180,7 +16596,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15195,7 +16613,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15210,7 +16630,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15225,7 +16647,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15240,7 +16664,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15256,6 +16682,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15271,6 +16698,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15296,7 +16724,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15313,7 +16743,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15329,7 +16761,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15344,7 +16778,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15360,7 +16796,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15376,7 +16814,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15391,7 +16831,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15406,7 +16848,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15421,7 +16865,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15436,7 +16882,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15451,7 +16899,9 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15467,6 +16917,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15482,6 +16933,7 @@ define amdgpu_kernel void @local_singlethread_one_as_monotonic_seq_cst_ret_cmpxc
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15507,7 +16959,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15524,7 +16978,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15540,7 +16996,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15555,7 +17013,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15571,7 +17031,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15587,7 +17049,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15602,7 +17066,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15617,7 +17083,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15632,7 +17100,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15647,7 +17117,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15662,7 +17134,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15678,6 +17152,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15693,6 +17168,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15718,7 +17194,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15735,7 +17213,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15751,7 +17231,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15766,7 +17248,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15782,7 +17266,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15798,7 +17284,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15813,7 +17301,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15828,7 +17318,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15843,7 +17335,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15858,7 +17352,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15873,7 +17369,9 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15889,6 +17387,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15904,6 +17403,7 @@ define amdgpu_kernel void @local_singlethread_one_as_release_seq_cst_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15929,7 +17429,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15946,7 +17448,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15962,7 +17466,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15977,7 +17483,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15993,7 +17501,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16009,7 +17519,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16024,7 +17536,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16039,7 +17553,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16054,7 +17570,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16069,7 +17587,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16084,7 +17604,9 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16100,6 +17622,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16115,6 +17638,7 @@ define amdgpu_kernel void @local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16140,7 +17664,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16157,7 +17683,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16173,7 +17701,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16188,7 +17718,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16204,7 +17736,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16220,7 +17754,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16235,7 +17771,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16250,7 +17788,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16265,7 +17805,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16280,7 +17822,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16295,7 +17839,9 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16311,6 +17857,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16326,6 +17873,7 @@ define amdgpu_kernel void @local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll
index 74a297241d851..943a97b3e8b26 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll
@@ -444,6 +444,7 @@ define amdgpu_kernel void @local_system_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -469,6 +470,7 @@ define amdgpu_kernel void @local_system_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -507,7 +509,7 @@ define amdgpu_kernel void @local_system_acquire_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -520,7 +522,7 @@ define amdgpu_kernel void @local_system_acquire_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: ds_load_b32 v1, v0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -628,6 +630,7 @@ define amdgpu_kernel void @local_system_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -655,6 +658,7 @@ define amdgpu_kernel void @local_system_seq_cst_load(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -700,7 +704,7 @@ define amdgpu_kernel void @local_system_seq_cst_load(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -712,9 +716,9 @@ define amdgpu_kernel void @local_system_seq_cst_load(
; GFX12-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_load_b32 v1, v0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -1154,7 +1158,7 @@ define amdgpu_kernel void @local_system_release_store(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(3) %out) {
@@ -1312,7 +1316,7 @@ define amdgpu_kernel void @local_system_seq_cst_store(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(3) %out) {
@@ -1541,6 +1545,7 @@ define amdgpu_kernel void @local_system_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1563,6 +1568,7 @@ define amdgpu_kernel void @local_system_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1597,7 +1603,7 @@ define amdgpu_kernel void @local_system_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -1609,7 +1615,7 @@ define amdgpu_kernel void @local_system_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -1766,7 +1772,7 @@ define amdgpu_kernel void @local_system_release_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
@@ -1863,6 +1869,7 @@ define amdgpu_kernel void @local_system_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1887,6 +1894,7 @@ define amdgpu_kernel void @local_system_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1928,7 +1936,7 @@ define amdgpu_kernel void @local_system_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -1939,9 +1947,9 @@ define amdgpu_kernel void @local_system_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -2037,6 +2045,7 @@ define amdgpu_kernel void @local_system_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -2061,6 +2070,7 @@ define amdgpu_kernel void @local_system_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -2102,7 +2112,7 @@ define amdgpu_kernel void @local_system_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -2113,9 +2123,9 @@ define amdgpu_kernel void @local_system_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -2218,6 +2228,7 @@ define amdgpu_kernel void @local_system_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2245,6 +2256,7 @@ define amdgpu_kernel void @local_system_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2286,7 +2298,7 @@ define amdgpu_kernel void @local_system_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2300,7 +2312,7 @@ define amdgpu_kernel void @local_system_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -2414,6 +2426,7 @@ define amdgpu_kernel void @local_system_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2443,6 +2456,7 @@ define amdgpu_kernel void @local_system_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2491,7 +2505,7 @@ define amdgpu_kernel void @local_system_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2504,9 +2518,9 @@ define amdgpu_kernel void @local_system_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -2620,6 +2634,7 @@ define amdgpu_kernel void @local_system_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2649,6 +2664,7 @@ define amdgpu_kernel void @local_system_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2697,7 +2713,7 @@ define amdgpu_kernel void @local_system_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2710,9 +2726,9 @@ define amdgpu_kernel void @local_system_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -2984,6 +3000,7 @@ define amdgpu_kernel void @local_system_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3010,6 +3027,7 @@ define amdgpu_kernel void @local_system_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3050,7 +3068,7 @@ define amdgpu_kernel void @local_system_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3064,7 +3082,7 @@ define amdgpu_kernel void @local_system_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3248,7 +3266,7 @@ define amdgpu_kernel void @local_system_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
@@ -3360,6 +3378,7 @@ define amdgpu_kernel void @local_system_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3388,6 +3407,7 @@ define amdgpu_kernel void @local_system_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3435,7 +3455,7 @@ define amdgpu_kernel void @local_system_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3448,9 +3468,9 @@ define amdgpu_kernel void @local_system_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3561,6 +3581,7 @@ define amdgpu_kernel void @local_system_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3589,6 +3610,7 @@ define amdgpu_kernel void @local_system_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3636,7 +3658,7 @@ define amdgpu_kernel void @local_system_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3649,9 +3671,9 @@ define amdgpu_kernel void @local_system_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3754,6 +3776,7 @@ define amdgpu_kernel void @local_system_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3780,6 +3803,7 @@ define amdgpu_kernel void @local_system_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3820,7 +3844,7 @@ define amdgpu_kernel void @local_system_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3834,7 +3858,7 @@ define amdgpu_kernel void @local_system_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3937,6 +3961,7 @@ define amdgpu_kernel void @local_system_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3963,6 +3988,7 @@ define amdgpu_kernel void @local_system_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4003,7 +4029,7 @@ define amdgpu_kernel void @local_system_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4017,7 +4043,7 @@ define amdgpu_kernel void @local_system_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4128,6 +4154,7 @@ define amdgpu_kernel void @local_system_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4156,6 +4183,7 @@ define amdgpu_kernel void @local_system_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4203,7 +4231,7 @@ define amdgpu_kernel void @local_system_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4216,9 +4244,9 @@ define amdgpu_kernel void @local_system_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4329,6 +4357,7 @@ define amdgpu_kernel void @local_system_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4357,6 +4386,7 @@ define amdgpu_kernel void @local_system_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4404,7 +4434,7 @@ define amdgpu_kernel void @local_system_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4417,9 +4447,9 @@ define amdgpu_kernel void @local_system_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4530,6 +4560,7 @@ define amdgpu_kernel void @local_system_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4558,6 +4589,7 @@ define amdgpu_kernel void @local_system_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4605,7 +4637,7 @@ define amdgpu_kernel void @local_system_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4618,9 +4650,9 @@ define amdgpu_kernel void @local_system_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4731,6 +4763,7 @@ define amdgpu_kernel void @local_system_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4759,6 +4792,7 @@ define amdgpu_kernel void @local_system_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4806,7 +4840,7 @@ define amdgpu_kernel void @local_system_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4819,9 +4853,9 @@ define amdgpu_kernel void @local_system_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4932,6 +4966,7 @@ define amdgpu_kernel void @local_system_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4960,6 +4995,7 @@ define amdgpu_kernel void @local_system_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5007,7 +5043,7 @@ define amdgpu_kernel void @local_system_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5020,9 +5056,9 @@ define amdgpu_kernel void @local_system_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5133,6 +5169,7 @@ define amdgpu_kernel void @local_system_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -5161,6 +5198,7 @@ define amdgpu_kernel void @local_system_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5208,7 +5246,7 @@ define amdgpu_kernel void @local_system_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5221,9 +5259,9 @@ define amdgpu_kernel void @local_system_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5334,6 +5372,7 @@ define amdgpu_kernel void @local_system_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -5362,6 +5401,7 @@ define amdgpu_kernel void @local_system_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5409,7 +5449,7 @@ define amdgpu_kernel void @local_system_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5422,9 +5462,9 @@ define amdgpu_kernel void @local_system_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5535,6 +5575,7 @@ define amdgpu_kernel void @local_system_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -5563,6 +5604,7 @@ define amdgpu_kernel void @local_system_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5610,7 +5652,7 @@ define amdgpu_kernel void @local_system_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5623,9 +5665,9 @@ define amdgpu_kernel void @local_system_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5954,6 +5996,7 @@ define amdgpu_kernel void @local_system_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -5985,6 +6028,7 @@ define amdgpu_kernel void @local_system_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6032,7 +6076,7 @@ define amdgpu_kernel void @local_system_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6048,7 +6092,7 @@ define amdgpu_kernel void @local_system_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -6275,7 +6319,7 @@ define amdgpu_kernel void @local_system_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
@@ -6407,6 +6451,7 @@ define amdgpu_kernel void @local_system_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6440,6 +6485,7 @@ define amdgpu_kernel void @local_system_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6494,7 +6540,7 @@ define amdgpu_kernel void @local_system_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6509,9 +6555,9 @@ define amdgpu_kernel void @local_system_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -6641,6 +6687,7 @@ define amdgpu_kernel void @local_system_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6674,6 +6721,7 @@ define amdgpu_kernel void @local_system_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6728,7 +6776,7 @@ define amdgpu_kernel void @local_system_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6743,9 +6791,9 @@ define amdgpu_kernel void @local_system_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -6867,6 +6915,7 @@ define amdgpu_kernel void @local_system_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6898,6 +6947,7 @@ define amdgpu_kernel void @local_system_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6945,7 +6995,7 @@ define amdgpu_kernel void @local_system_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6961,7 +7011,7 @@ define amdgpu_kernel void @local_system_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7083,6 +7133,7 @@ define amdgpu_kernel void @local_system_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7114,6 +7165,7 @@ define amdgpu_kernel void @local_system_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7161,7 +7213,7 @@ define amdgpu_kernel void @local_system_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7177,7 +7229,7 @@ define amdgpu_kernel void @local_system_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7307,6 +7359,7 @@ define amdgpu_kernel void @local_system_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7340,6 +7393,7 @@ define amdgpu_kernel void @local_system_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7394,7 +7448,7 @@ define amdgpu_kernel void @local_system_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7409,9 +7463,9 @@ define amdgpu_kernel void @local_system_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7541,6 +7595,7 @@ define amdgpu_kernel void @local_system_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7574,6 +7629,7 @@ define amdgpu_kernel void @local_system_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7628,7 +7684,7 @@ define amdgpu_kernel void @local_system_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7643,9 +7699,9 @@ define amdgpu_kernel void @local_system_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7775,6 +7831,7 @@ define amdgpu_kernel void @local_system_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7808,6 +7865,7 @@ define amdgpu_kernel void @local_system_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7862,7 +7920,7 @@ define amdgpu_kernel void @local_system_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7877,9 +7935,9 @@ define amdgpu_kernel void @local_system_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8009,6 +8067,7 @@ define amdgpu_kernel void @local_system_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8042,6 +8101,7 @@ define amdgpu_kernel void @local_system_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8096,7 +8156,7 @@ define amdgpu_kernel void @local_system_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8111,9 +8171,9 @@ define amdgpu_kernel void @local_system_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8243,6 +8303,7 @@ define amdgpu_kernel void @local_system_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8276,6 +8337,7 @@ define amdgpu_kernel void @local_system_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8330,7 +8392,7 @@ define amdgpu_kernel void @local_system_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8345,9 +8407,9 @@ define amdgpu_kernel void @local_system_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8477,6 +8539,7 @@ define amdgpu_kernel void @local_system_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8510,6 +8573,7 @@ define amdgpu_kernel void @local_system_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8564,7 +8628,7 @@ define amdgpu_kernel void @local_system_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8579,9 +8643,9 @@ define amdgpu_kernel void @local_system_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8711,6 +8775,7 @@ define amdgpu_kernel void @local_system_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8744,6 +8809,7 @@ define amdgpu_kernel void @local_system_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8798,7 +8864,7 @@ define amdgpu_kernel void @local_system_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8813,9 +8879,9 @@ define amdgpu_kernel void @local_system_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8945,6 +9011,7 @@ define amdgpu_kernel void @local_system_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8978,6 +9045,7 @@ define amdgpu_kernel void @local_system_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -9032,7 +9100,7 @@ define amdgpu_kernel void @local_system_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -9047,9 +9115,9 @@ define amdgpu_kernel void @local_system_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -9415,6 +9483,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -9429,6 +9498,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -9442,6 +9512,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -9454,6 +9525,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -9467,6 +9539,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -9480,6 +9553,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9492,6 +9566,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9504,6 +9579,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9516,6 +9592,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9528,6 +9605,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -9540,6 +9618,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -9552,6 +9631,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -9564,6 +9644,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -9585,7 +9666,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -9599,7 +9682,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -9612,7 +9697,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX10-WGP-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -9624,7 +9711,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX10-CU-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -9637,7 +9726,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -9650,7 +9741,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9662,7 +9755,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9674,7 +9769,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9686,7 +9783,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX942-TGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9698,7 +9797,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX11-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -9710,7 +9811,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX11-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -9722,7 +9825,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX12-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -9734,7 +9839,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_load(
; GFX12-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -10036,6 +10143,7 @@ define amdgpu_kernel void @local_system_one_as_release_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -10047,6 +10155,7 @@ define amdgpu_kernel void @local_system_one_as_release_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -10057,6 +10166,7 @@ define amdgpu_kernel void @local_system_one_as_release_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10067,6 +10177,7 @@ define amdgpu_kernel void @local_system_one_as_release_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -10078,6 +10189,7 @@ define amdgpu_kernel void @local_system_one_as_release_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10088,6 +10200,7 @@ define amdgpu_kernel void @local_system_one_as_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10098,6 +10211,7 @@ define amdgpu_kernel void @local_system_one_as_release_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -10108,6 +10222,7 @@ define amdgpu_kernel void @local_system_one_as_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10118,6 +10233,7 @@ define amdgpu_kernel void @local_system_one_as_release_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -10128,6 +10244,7 @@ define amdgpu_kernel void @local_system_one_as_release_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -10138,6 +10255,7 @@ define amdgpu_kernel void @local_system_one_as_release_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -10176,6 +10294,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -10187,6 +10306,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -10197,6 +10317,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10207,6 +10328,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -10218,6 +10340,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10228,6 +10351,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10238,6 +10362,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -10248,6 +10373,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10258,6 +10384,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -10268,6 +10395,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -10278,6 +10406,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -10457,6 +10586,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10468,6 +10598,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10478,6 +10609,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10488,6 +10620,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10499,6 +10632,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10509,6 +10643,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10519,6 +10654,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10529,6 +10665,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10539,6 +10676,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10549,6 +10687,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10559,6 +10698,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10569,6 +10709,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_acquire_atomicrmw:
@@ -10579,6 +10720,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -10596,6 +10738,7 @@ define amdgpu_kernel void @local_system_one_as_release_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -10607,6 +10750,7 @@ define amdgpu_kernel void @local_system_one_as_release_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -10617,6 +10761,7 @@ define amdgpu_kernel void @local_system_one_as_release_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10627,6 +10772,7 @@ define amdgpu_kernel void @local_system_one_as_release_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -10638,6 +10784,7 @@ define amdgpu_kernel void @local_system_one_as_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10648,6 +10795,7 @@ define amdgpu_kernel void @local_system_one_as_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10658,6 +10806,7 @@ define amdgpu_kernel void @local_system_one_as_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -10668,6 +10817,7 @@ define amdgpu_kernel void @local_system_one_as_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10678,6 +10828,7 @@ define amdgpu_kernel void @local_system_one_as_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -10688,6 +10839,7 @@ define amdgpu_kernel void @local_system_one_as_release_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -10698,6 +10850,7 @@ define amdgpu_kernel void @local_system_one_as_release_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -10736,7 +10889,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10747,7 +10902,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10757,7 +10914,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10767,7 +10926,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10778,7 +10939,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10788,7 +10951,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10798,7 +10963,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10808,7 +10975,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10818,7 +10987,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10828,7 +10999,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10838,7 +11011,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10849,6 +11024,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_acq_rel_atomicrmw:
@@ -10859,6 +11035,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -10876,7 +11053,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10887,7 +11066,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10897,7 +11078,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10907,7 +11090,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10918,7 +11103,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10928,7 +11115,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10938,7 +11127,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10948,7 +11139,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10958,7 +11151,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10968,7 +11163,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10978,7 +11175,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10989,6 +11188,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_seq_cst_atomicrmw:
@@ -10999,6 +11199,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -11017,6 +11218,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -11032,6 +11234,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -11046,6 +11249,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -11059,6 +11263,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -11073,6 +11278,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -11087,6 +11293,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11100,6 +11307,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11113,6 +11321,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11126,6 +11335,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11139,6 +11349,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -11152,6 +11363,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -11165,6 +11377,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -11178,6 +11391,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -11199,7 +11413,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -11214,7 +11430,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -11228,7 +11446,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -11241,7 +11461,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -11255,7 +11477,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -11269,7 +11493,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11282,7 +11508,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11295,7 +11523,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11308,7 +11538,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11321,7 +11553,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -11334,7 +11568,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -11348,6 +11584,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -11361,6 +11598,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -11382,7 +11620,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -11397,7 +11637,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -11411,7 +11653,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -11424,7 +11668,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -11438,7 +11684,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -11452,7 +11700,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11465,7 +11715,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11478,7 +11730,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11491,7 +11745,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11504,7 +11760,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -11517,7 +11775,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -11531,6 +11791,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -11544,6 +11805,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -11735,6 +11997,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11748,6 +12011,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11760,6 +12024,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11772,6 +12037,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11785,6 +12051,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11797,6 +12064,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11809,6 +12077,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11821,6 +12090,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11833,6 +12103,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11845,6 +12116,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11857,6 +12129,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11869,6 +12142,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
@@ -11881,6 +12155,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11901,6 +12176,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -11914,6 +12190,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX7-NEXT: s_endpgm
;
@@ -11926,6 +12203,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-WGP-NEXT: s_endpgm
;
@@ -11938,6 +12216,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -11951,6 +12230,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11963,6 +12243,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11975,6 +12256,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11987,6 +12269,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11999,6 +12282,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -12011,6 +12295,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -12023,6 +12308,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -12068,7 +12354,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12081,7 +12369,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12093,7 +12383,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12105,7 +12397,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12118,7 +12412,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12130,7 +12426,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12142,7 +12440,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12154,7 +12454,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12166,7 +12468,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12178,7 +12482,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12190,7 +12496,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12203,6 +12511,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
@@ -12215,6 +12524,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12235,7 +12545,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12248,7 +12560,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12260,7 +12574,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12272,7 +12588,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12285,7 +12603,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12297,7 +12617,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12309,7 +12631,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12321,7 +12645,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12333,7 +12659,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12345,7 +12673,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12357,7 +12687,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12370,6 +12702,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
@@ -12382,6 +12715,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12403,6 +12737,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12416,6 +12751,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12428,6 +12764,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12440,6 +12777,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12453,6 +12791,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12465,6 +12804,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12477,6 +12817,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12489,6 +12830,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12501,6 +12843,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12513,6 +12856,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12525,6 +12869,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12537,6 +12882,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
@@ -12549,6 +12895,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12570,6 +12917,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12583,6 +12931,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12595,6 +12944,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12607,6 +12957,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12620,6 +12971,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12632,6 +12984,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12644,6 +12997,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12656,6 +13010,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12668,6 +13023,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12680,6 +13036,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12692,6 +13049,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12704,6 +13062,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
@@ -12716,6 +13075,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12736,7 +13096,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12749,7 +13111,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12761,7 +13125,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12773,7 +13139,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12786,7 +13154,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12798,7 +13168,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12810,7 +13182,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12822,7 +13196,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12834,7 +13210,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12846,7 +13224,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12858,7 +13238,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12871,6 +13253,7 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_release_acquire_cmpxchg:
@@ -12883,6 +13266,7 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12903,7 +13287,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -12916,7 +13302,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -12928,7 +13316,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -12940,7 +13330,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -12953,7 +13345,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -12965,7 +13359,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -12977,7 +13373,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -12989,7 +13387,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -13001,7 +13401,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -13013,7 +13415,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -13025,7 +13429,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -13038,6 +13444,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
@@ -13050,6 +13457,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13070,7 +13478,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13083,7 +13493,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13095,7 +13507,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13107,7 +13521,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13120,7 +13536,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13132,7 +13550,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13144,7 +13564,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13156,7 +13578,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13168,7 +13592,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13180,7 +13606,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13192,7 +13620,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13205,6 +13635,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
@@ -13217,6 +13648,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13237,7 +13669,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13250,7 +13684,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13262,7 +13698,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13274,7 +13712,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13287,7 +13727,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13299,7 +13741,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13311,7 +13755,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13323,7 +13769,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13335,7 +13783,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13347,7 +13797,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13359,7 +13811,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13372,6 +13826,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
@@ -13384,6 +13839,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13404,7 +13860,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13417,7 +13875,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13429,7 +13889,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13441,7 +13903,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13454,7 +13918,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13466,7 +13932,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13478,7 +13946,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13490,7 +13960,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13502,7 +13974,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13514,7 +13988,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13526,7 +14002,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13539,6 +14017,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
@@ -13551,6 +14030,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13571,7 +14051,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13584,7 +14066,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13596,7 +14080,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13608,7 +14094,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13621,7 +14109,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13633,7 +14123,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13645,7 +14137,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13657,7 +14151,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13669,7 +14165,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13681,7 +14179,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13693,7 +14193,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13706,6 +14208,7 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
@@ -13718,6 +14221,7 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13738,7 +14242,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13751,7 +14257,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13763,7 +14271,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13775,7 +14285,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13788,7 +14300,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13800,7 +14314,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13812,7 +14328,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13824,7 +14342,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13836,7 +14356,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13848,7 +14370,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13860,7 +14384,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13873,6 +14399,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13885,6 +14412,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13905,7 +14433,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13918,7 +14448,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13930,7 +14462,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13942,7 +14476,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13955,7 +14491,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13967,7 +14505,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13979,7 +14519,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13991,7 +14533,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14003,7 +14547,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14015,7 +14561,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14027,7 +14575,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14040,6 +14590,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14052,6 +14603,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -14284,6 +14836,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14301,6 +14854,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14317,6 +14871,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14332,6 +14887,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14348,6 +14904,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14364,6 +14921,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14379,6 +14937,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14394,6 +14953,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14409,6 +14969,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14424,6 +14985,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14439,6 +15001,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14454,6 +15017,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14469,6 +15033,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14494,6 +15059,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
@@ -14511,6 +15077,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
@@ -14527,6 +15094,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -14542,6 +15110,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -14558,6 +15127,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
@@ -14574,6 +15144,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14589,6 +15160,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14604,6 +15176,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14619,6 +15192,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14634,6 +15208,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -14649,6 +15224,7 @@ define amdgpu_kernel void @local_system_one_as_release_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -14705,7 +15281,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14722,7 +15300,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14738,7 +15318,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14753,7 +15335,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14769,7 +15353,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14785,7 +15371,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14800,7 +15388,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14815,7 +15405,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14830,7 +15422,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14845,7 +15439,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14860,7 +15456,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14876,6 +15474,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14891,6 +15490,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14916,7 +15516,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14933,7 +15535,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14949,7 +15553,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14964,7 +15570,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14980,7 +15588,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14996,7 +15606,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15011,7 +15623,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15026,7 +15640,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15041,7 +15657,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15056,7 +15674,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15071,7 +15691,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15087,6 +15709,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15102,6 +15725,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15128,6 +15752,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15145,6 +15770,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15161,6 +15787,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15176,6 +15803,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15192,6 +15820,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15208,6 +15837,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15223,6 +15853,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15238,6 +15869,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15253,6 +15885,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15268,6 +15901,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15283,6 +15917,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15298,6 +15933,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15313,6 +15949,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15339,6 +15976,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15356,6 +15994,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15372,6 +16011,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15387,6 +16027,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15403,6 +16044,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15419,6 +16061,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15434,6 +16077,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15449,6 +16093,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15464,6 +16109,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15479,6 +16125,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15494,6 +16141,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15509,6 +16157,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15524,6 +16173,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15549,7 +16199,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15566,7 +16218,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15582,7 +16236,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15597,7 +16253,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15613,7 +16271,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15629,7 +16289,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15644,7 +16306,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15659,7 +16323,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15674,7 +16340,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15689,7 +16357,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15704,7 +16374,9 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15720,6 +16392,7 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15735,6 +16408,7 @@ define amdgpu_kernel void @local_system_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15760,7 +16434,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15777,7 +16453,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15793,7 +16471,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15808,7 +16488,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15824,7 +16506,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15840,7 +16524,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15855,7 +16541,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15870,7 +16558,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15885,7 +16575,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15900,7 +16592,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15915,7 +16609,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15931,6 +16627,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15946,6 +16643,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15971,7 +16669,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15988,7 +16688,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16004,7 +16706,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16019,7 +16723,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16035,7 +16741,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16051,7 +16759,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16066,7 +16776,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16081,7 +16793,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16096,7 +16810,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16111,7 +16827,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16126,7 +16844,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16142,6 +16862,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16157,6 +16878,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16182,7 +16904,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16199,7 +16923,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16215,7 +16941,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16230,7 +16958,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16246,7 +16976,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16262,7 +16994,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16277,7 +17011,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16292,7 +17028,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16307,7 +17045,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16322,7 +17062,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16337,7 +17079,9 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16353,6 +17097,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16368,6 +17113,7 @@ define amdgpu_kernel void @local_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16393,7 +17139,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16410,7 +17158,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16426,7 +17176,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16441,7 +17193,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16457,7 +17211,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16473,7 +17229,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16488,7 +17246,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16503,7 +17263,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16518,7 +17280,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16533,7 +17297,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16548,7 +17314,9 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16564,6 +17332,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16579,6 +17348,7 @@ define amdgpu_kernel void @local_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16604,7 +17374,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16621,7 +17393,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16637,7 +17411,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16652,7 +17428,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16668,7 +17446,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16684,7 +17464,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16699,7 +17481,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16714,7 +17498,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16729,7 +17515,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16744,7 +17532,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16759,7 +17549,9 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16775,6 +17567,7 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16790,6 +17583,7 @@ define amdgpu_kernel void @local_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16815,7 +17609,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16832,7 +17628,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16848,7 +17646,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16863,7 +17663,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16879,7 +17681,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16895,7 +17699,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16910,7 +17716,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16925,7 +17733,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16940,7 +17750,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16955,7 +17767,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16970,7 +17784,9 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16986,6 +17802,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -17001,6 +17818,7 @@ define amdgpu_kernel void @local_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -17026,7 +17844,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -17043,7 +17863,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -17059,7 +17881,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -17074,7 +17898,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -17090,7 +17916,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -17106,7 +17934,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17121,7 +17951,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17136,7 +17968,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17151,7 +17985,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17166,7 +18002,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -17181,7 +18019,9 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -17197,6 +18037,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -17212,6 +18053,7 @@ define amdgpu_kernel void @local_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll
index bc2508411ed6b..2856b74508f37 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll
@@ -43,6 +43,7 @@ define amdgpu_kernel void @local_volatile_load_0(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: ds_read_b32 v2, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -126,7 +127,7 @@ define amdgpu_kernel void @local_volatile_load_0(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: ds_load_b32 v1, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX12-WGP-NEXT: s_endpgm
;
@@ -138,7 +139,7 @@ define amdgpu_kernel void @local_volatile_load_0(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: ds_load_b32 v1, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %in, ptr addrspace(1) %out) {
@@ -186,6 +187,7 @@ define amdgpu_kernel void @local_volatile_load_1(
; GFX7-NEXT: v_add_i32_e64 v0, s[6:7], s6, v0
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: ds_read_b32 v2, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -288,7 +290,7 @@ define amdgpu_kernel void @local_volatile_load_1(
; GFX12-WGP-NEXT: s_wait_alu 0xfffe
; GFX12-WGP-NEXT: v_lshl_add_u32 v1, v1, s2, s3
; GFX12-WGP-NEXT: ds_load_b32 v1, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX12-WGP-NEXT: s_endpgm
;
@@ -305,7 +307,7 @@ define amdgpu_kernel void @local_volatile_load_1(
; GFX12-CU-NEXT: s_wait_alu 0xfffe
; GFX12-CU-NEXT: v_lshl_add_u32 v1, v1, s2, s3
; GFX12-CU-NEXT: ds_load_b32 v1, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: global_store_b32 v0, v1, s[0:1]
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %in, ptr addrspace(1) %out) {
@@ -330,6 +332,7 @@ define amdgpu_kernel void @local_volatile_store_0(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v1, s0
; GFX6-NEXT: ds_write_b32 v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_volatile_store_0:
@@ -343,6 +346,7 @@ define amdgpu_kernel void @local_volatile_store_0(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v1, s4
; GFX7-NEXT: ds_write_b32 v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_volatile_store_0:
@@ -355,6 +359,7 @@ define amdgpu_kernel void @local_volatile_store_0(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_volatile_store_0:
@@ -367,6 +372,7 @@ define amdgpu_kernel void @local_volatile_store_0(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
; GFX10-CU-NEXT: ds_write_b32 v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_volatile_store_0:
@@ -380,6 +386,7 @@ define amdgpu_kernel void @local_volatile_store_0(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_volatile_store_0:
@@ -392,6 +399,7 @@ define amdgpu_kernel void @local_volatile_store_0(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_volatile_store_0:
@@ -404,6 +412,7 @@ define amdgpu_kernel void @local_volatile_store_0(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX11-CU-NEXT: ds_store_b32 v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_volatile_store_0:
@@ -421,6 +430,7 @@ define amdgpu_kernel void @local_volatile_store_0(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_volatile_store_0:
@@ -438,6 +448,7 @@ define amdgpu_kernel void @local_volatile_store_0(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %in, ptr addrspace(3) %out) {
entry:
@@ -461,6 +472,7 @@ define amdgpu_kernel void @local_volatile_store_1(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v1, s0
; GFX6-NEXT: ds_write_b32 v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_volatile_store_1:
@@ -476,6 +488,7 @@ define amdgpu_kernel void @local_volatile_store_1(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v1, s4
; GFX7-NEXT: ds_write_b32 v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_volatile_store_1:
@@ -489,6 +502,7 @@ define amdgpu_kernel void @local_volatile_store_1(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_volatile_store_1:
@@ -502,6 +516,7 @@ define amdgpu_kernel void @local_volatile_store_1(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
; GFX10-CU-NEXT: ds_write_b32 v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_volatile_store_1:
@@ -517,6 +532,7 @@ define amdgpu_kernel void @local_volatile_store_1(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_volatile_store_1:
@@ -532,6 +548,7 @@ define amdgpu_kernel void @local_volatile_store_1(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_volatile_store_1:
@@ -547,6 +564,7 @@ define amdgpu_kernel void @local_volatile_store_1(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX11-CU-NEXT: ds_store_b32 v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_volatile_store_1:
@@ -568,6 +586,7 @@ define amdgpu_kernel void @local_volatile_store_1(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_volatile_store_1:
@@ -589,6 +608,7 @@ define amdgpu_kernel void @local_volatile_store_1(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %in, ptr addrspace(3) %out) {
entry:
@@ -701,7 +721,7 @@ define amdgpu_kernel void @local_volatile_workgroup_acquire_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -714,7 +734,7 @@ define amdgpu_kernel void @local_volatile_workgroup_acquire_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: ds_load_b32 v1, v0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -830,7 +850,7 @@ define amdgpu_kernel void @local_volatile_workgroup_release_store(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(3) %out) {
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-wavefront.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-wavefront.ll
index b24622a48a16b..e166efe334400 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-wavefront.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-wavefront.ll
@@ -366,6 +366,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -380,6 +381,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -393,6 +395,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -405,6 +408,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -418,6 +422,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -431,6 +436,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -443,6 +449,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -455,6 +462,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -467,6 +475,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -479,6 +488,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -491,6 +501,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -503,6 +514,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -515,6 +527,7 @@ define amdgpu_kernel void @local_wavefront_acquire_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -536,7 +549,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -550,7 +565,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -563,7 +580,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX10-WGP-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -575,7 +594,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX10-CU-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -588,7 +609,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -601,7 +624,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -613,7 +638,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -625,7 +652,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -637,7 +666,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX942-TGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -649,7 +680,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX11-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -661,7 +694,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX11-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -673,7 +708,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX12-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -685,7 +722,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_load(
; GFX12-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -987,6 +1026,7 @@ define amdgpu_kernel void @local_wavefront_release_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -998,6 +1038,7 @@ define amdgpu_kernel void @local_wavefront_release_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -1008,6 +1049,7 @@ define amdgpu_kernel void @local_wavefront_release_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1018,6 +1060,7 @@ define amdgpu_kernel void @local_wavefront_release_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -1029,6 +1072,7 @@ define amdgpu_kernel void @local_wavefront_release_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1039,6 +1083,7 @@ define amdgpu_kernel void @local_wavefront_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1049,6 +1094,7 @@ define amdgpu_kernel void @local_wavefront_release_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1059,6 +1105,7 @@ define amdgpu_kernel void @local_wavefront_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1069,6 +1116,7 @@ define amdgpu_kernel void @local_wavefront_release_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1079,6 +1127,7 @@ define amdgpu_kernel void @local_wavefront_release_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1089,6 +1138,7 @@ define amdgpu_kernel void @local_wavefront_release_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -1127,6 +1177,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -1138,6 +1189,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -1148,6 +1200,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1158,6 +1211,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -1169,6 +1223,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1179,6 +1234,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1189,6 +1245,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1199,6 +1256,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1209,6 +1267,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1219,6 +1278,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1229,6 +1289,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -1408,6 +1469,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1419,6 +1481,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1429,6 +1492,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1439,6 +1503,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1450,6 +1515,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1460,6 +1526,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1470,6 +1537,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1480,6 +1548,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1490,6 +1559,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1500,6 +1570,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1510,6 +1581,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1520,6 +1592,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_acquire_atomicrmw:
@@ -1530,6 +1603,7 @@ define amdgpu_kernel void @local_wavefront_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -1547,6 +1621,7 @@ define amdgpu_kernel void @local_wavefront_release_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -1558,6 +1633,7 @@ define amdgpu_kernel void @local_wavefront_release_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -1568,6 +1644,7 @@ define amdgpu_kernel void @local_wavefront_release_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -1578,6 +1655,7 @@ define amdgpu_kernel void @local_wavefront_release_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -1589,6 +1667,7 @@ define amdgpu_kernel void @local_wavefront_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -1599,6 +1678,7 @@ define amdgpu_kernel void @local_wavefront_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1609,6 +1689,7 @@ define amdgpu_kernel void @local_wavefront_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1619,6 +1700,7 @@ define amdgpu_kernel void @local_wavefront_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -1629,6 +1711,7 @@ define amdgpu_kernel void @local_wavefront_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1639,6 +1722,7 @@ define amdgpu_kernel void @local_wavefront_release_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -1649,6 +1733,7 @@ define amdgpu_kernel void @local_wavefront_release_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -1687,7 +1772,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1698,7 +1785,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1708,7 +1797,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1718,7 +1809,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1729,7 +1822,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1739,7 +1834,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1749,7 +1846,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1759,7 +1858,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1769,7 +1870,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1779,7 +1882,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1789,7 +1894,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1800,6 +1907,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_acq_rel_atomicrmw:
@@ -1810,6 +1918,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -1827,7 +1936,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1838,7 +1949,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1848,7 +1961,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1858,7 +1973,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1869,7 +1986,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1879,7 +1998,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1889,7 +2010,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1899,7 +2022,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1909,7 +2034,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1919,7 +2046,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1929,7 +2058,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1940,6 +2071,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_seq_cst_atomicrmw:
@@ -1950,6 +2082,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -1968,6 +2101,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -1983,6 +2117,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -1997,6 +2132,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -2010,6 +2146,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -2024,6 +2161,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -2038,6 +2176,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2051,6 +2190,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2064,6 +2204,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2077,6 +2218,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2090,6 +2232,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -2103,6 +2246,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -2116,6 +2260,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2129,6 +2274,7 @@ define amdgpu_kernel void @local_wavefront_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -2150,7 +2296,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -2165,7 +2313,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -2179,7 +2329,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -2192,7 +2344,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -2206,7 +2360,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -2220,7 +2376,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2233,7 +2391,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2246,7 +2406,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2259,7 +2421,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2272,7 +2436,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -2285,7 +2451,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -2299,6 +2467,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2312,6 +2481,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -2333,7 +2503,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -2348,7 +2520,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -2362,7 +2536,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -2375,7 +2551,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -2389,7 +2567,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -2403,7 +2583,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2416,7 +2598,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2429,7 +2613,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2442,7 +2628,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -2455,7 +2643,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -2468,7 +2658,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -2482,6 +2674,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2495,6 +2688,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -2686,6 +2880,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2699,6 +2894,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2711,6 +2907,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2723,6 +2920,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2736,6 +2934,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2748,6 +2947,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2760,6 +2960,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2772,6 +2973,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2784,6 +2986,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2796,6 +2999,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2808,6 +3012,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2820,6 +3025,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
@@ -2832,6 +3038,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -2852,6 +3059,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -2865,6 +3073,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX7-NEXT: s_endpgm
;
@@ -2877,6 +3086,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-WGP-NEXT: s_endpgm
;
@@ -2889,6 +3099,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -2902,6 +3113,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -2914,6 +3126,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -2926,6 +3139,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -2938,6 +3152,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -2950,6 +3165,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -2962,6 +3178,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -2974,6 +3191,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -3019,7 +3237,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3032,7 +3252,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3044,7 +3266,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3056,7 +3280,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3069,7 +3295,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3081,7 +3309,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3093,7 +3323,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3105,7 +3337,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3117,7 +3351,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3129,7 +3365,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3141,7 +3379,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3154,6 +3394,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
@@ -3166,6 +3407,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3186,7 +3428,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3199,7 +3443,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3211,7 +3457,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3223,7 +3471,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3236,7 +3486,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3248,7 +3500,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3260,7 +3514,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3272,7 +3528,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3284,7 +3542,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3296,7 +3556,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3308,7 +3570,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3321,6 +3585,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
@@ -3333,6 +3598,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3354,6 +3620,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3367,6 +3634,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3379,6 +3647,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3391,6 +3660,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3404,6 +3674,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3416,6 +3687,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3428,6 +3700,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3440,6 +3713,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3452,6 +3726,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3464,6 +3739,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3476,6 +3752,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3488,6 +3765,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
@@ -3500,6 +3778,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3521,6 +3800,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3534,6 +3814,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3546,6 +3827,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3558,6 +3840,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3571,6 +3854,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3583,6 +3867,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3595,6 +3880,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3607,6 +3893,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3619,6 +3906,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3631,6 +3919,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3643,6 +3932,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3655,6 +3945,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_acquire_acquire_cmpxchg:
@@ -3667,6 +3958,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3687,7 +3979,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3700,7 +3994,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3712,7 +4008,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3724,7 +4022,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3737,7 +4037,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3749,7 +4051,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3761,7 +4065,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3773,7 +4079,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3785,7 +4093,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3797,7 +4107,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3809,7 +4121,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3822,6 +4136,7 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_release_acquire_cmpxchg:
@@ -3834,6 +4149,7 @@ define amdgpu_kernel void @local_wavefront_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3854,7 +4170,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -3867,7 +4185,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -3879,7 +4199,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -3891,7 +4213,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -3904,7 +4228,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -3916,7 +4242,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -3928,7 +4256,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -3940,7 +4270,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -3952,7 +4284,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -3964,7 +4298,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -3976,7 +4312,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -3989,6 +4327,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
@@ -4001,6 +4340,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4021,7 +4361,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4034,7 +4376,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4046,7 +4390,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4058,7 +4404,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4071,7 +4419,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4083,7 +4433,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4095,7 +4447,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4107,7 +4461,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4119,7 +4475,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4131,7 +4489,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4143,7 +4503,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4156,6 +4518,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
@@ -4168,6 +4531,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4188,7 +4552,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4201,7 +4567,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4213,7 +4581,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4225,7 +4595,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4238,7 +4610,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4250,7 +4624,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4262,7 +4638,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4274,7 +4652,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4286,7 +4666,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4298,7 +4680,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4310,7 +4694,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4323,6 +4709,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
@@ -4335,6 +4722,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4355,7 +4743,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4368,7 +4758,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4380,7 +4772,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4392,7 +4786,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4405,7 +4801,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4417,7 +4815,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4429,7 +4829,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4441,7 +4843,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4453,7 +4857,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4465,7 +4871,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4477,7 +4885,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4490,6 +4900,7 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
@@ -4502,6 +4913,7 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4522,7 +4934,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4535,7 +4949,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4547,7 +4963,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4559,7 +4977,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4572,7 +4992,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4584,7 +5006,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4596,7 +5020,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4608,7 +5034,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4620,7 +5048,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4632,7 +5062,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4644,7 +5076,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4657,6 +5091,7 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_release_seq_cst_cmpxchg:
@@ -4669,6 +5104,7 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4689,7 +5125,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4702,7 +5140,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4714,7 +5154,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4726,7 +5168,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4739,7 +5183,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4751,7 +5197,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4763,7 +5211,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4775,7 +5225,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4787,7 +5239,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4799,7 +5253,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4811,7 +5267,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4824,6 +5282,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
@@ -4836,6 +5295,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4856,7 +5316,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -4869,7 +5331,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -4881,7 +5345,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -4893,7 +5359,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -4906,7 +5374,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -4918,7 +5388,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -4930,7 +5402,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -4942,7 +5416,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -4954,7 +5430,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -4966,7 +5444,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -4978,7 +5458,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -4991,6 +5473,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
@@ -5003,6 +5486,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5235,6 +5719,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -5252,6 +5737,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -5268,6 +5754,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -5283,6 +5770,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -5299,6 +5787,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -5315,6 +5804,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5330,6 +5820,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5345,6 +5836,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5360,6 +5852,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5375,6 +5868,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -5390,6 +5884,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -5405,6 +5900,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -5420,6 +5916,7 @@ define amdgpu_kernel void @local_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -5445,6 +5942,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
@@ -5462,6 +5960,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
@@ -5478,6 +5977,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -5493,6 +5993,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -5509,6 +6010,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
@@ -5525,6 +6027,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -5540,6 +6043,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -5555,6 +6059,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -5570,6 +6075,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -5585,6 +6091,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -5600,6 +6107,7 @@ define amdgpu_kernel void @local_wavefront_release_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -5656,7 +6164,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -5673,7 +6183,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -5689,7 +6201,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -5704,7 +6218,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -5720,7 +6236,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -5736,7 +6254,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5751,7 +6271,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5766,7 +6288,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5781,7 +6305,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5796,7 +6322,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -5811,7 +6339,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -5827,6 +6357,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -5842,6 +6373,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -5867,7 +6399,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -5884,7 +6418,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -5900,7 +6436,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -5915,7 +6453,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -5931,7 +6471,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -5947,7 +6489,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5962,7 +6506,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5977,7 +6523,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -5992,7 +6540,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6007,7 +6557,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -6022,7 +6574,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -6038,6 +6592,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6053,6 +6608,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -6079,6 +6635,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -6096,6 +6653,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -6112,6 +6670,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -6127,6 +6686,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -6143,6 +6703,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -6159,6 +6720,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6174,6 +6736,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6189,6 +6752,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6204,6 +6768,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6219,6 +6784,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -6234,6 +6800,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -6249,6 +6816,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6264,6 +6832,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -6290,6 +6859,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -6307,6 +6877,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -6323,6 +6894,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -6338,6 +6910,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -6354,6 +6927,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -6370,6 +6944,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6385,6 +6960,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6400,6 +6976,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6415,6 +6992,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6430,6 +7008,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -6445,6 +7024,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -6460,6 +7040,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6475,6 +7056,7 @@ define amdgpu_kernel void @local_wavefront_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -6500,7 +7082,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -6517,7 +7101,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -6533,7 +7119,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -6548,7 +7136,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -6564,7 +7154,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -6580,7 +7172,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6595,7 +7189,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6610,7 +7206,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6625,7 +7223,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6640,7 +7240,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -6655,7 +7257,9 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -6671,6 +7275,7 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6686,6 +7291,7 @@ define amdgpu_kernel void @local_wavefront_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -6711,7 +7317,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -6728,7 +7336,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -6744,7 +7354,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -6759,7 +7371,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -6775,7 +7389,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -6791,7 +7407,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6806,7 +7424,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6821,7 +7441,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6836,7 +7458,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -6851,7 +7475,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -6866,7 +7492,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -6882,6 +7510,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6897,6 +7526,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -6922,7 +7552,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -6939,7 +7571,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -6955,7 +7589,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -6970,7 +7606,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -6986,7 +7624,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -7002,7 +7642,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7017,7 +7659,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7032,7 +7676,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7047,7 +7693,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7062,7 +7710,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -7077,7 +7727,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -7093,6 +7745,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7108,6 +7761,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -7133,7 +7787,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -7150,7 +7806,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -7166,7 +7824,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -7181,7 +7841,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -7197,7 +7859,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -7213,7 +7877,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7228,7 +7894,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7243,7 +7911,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7258,7 +7928,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7273,7 +7945,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -7288,7 +7962,9 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -7304,6 +7980,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7319,6 +7996,7 @@ define amdgpu_kernel void @local_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -7344,7 +8022,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -7361,7 +8041,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -7377,7 +8059,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -7392,7 +8076,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -7408,7 +8094,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -7424,7 +8112,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7439,7 +8129,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7454,7 +8146,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7469,7 +8163,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7484,7 +8180,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -7499,7 +8197,9 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -7515,6 +8215,7 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7530,6 +8231,7 @@ define amdgpu_kernel void @local_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -7555,7 +8257,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -7572,7 +8276,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -7588,7 +8294,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -7603,7 +8311,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -7619,7 +8329,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -7635,7 +8347,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7650,7 +8364,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7665,7 +8381,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7680,7 +8398,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7695,7 +8415,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -7710,7 +8432,9 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -7726,6 +8450,7 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7741,6 +8466,7 @@ define amdgpu_kernel void @local_wavefront_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -7766,7 +8492,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -7783,7 +8511,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -7799,7 +8529,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -7814,7 +8546,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -7830,7 +8564,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -7846,7 +8582,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7861,7 +8599,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7876,7 +8616,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7891,7 +8633,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -7906,7 +8650,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -7921,7 +8667,9 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -7937,6 +8685,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7952,6 +8701,7 @@ define amdgpu_kernel void @local_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -7977,7 +8727,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -7994,7 +8746,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -8010,7 +8764,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -8025,7 +8781,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -8041,7 +8799,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -8057,7 +8817,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8072,7 +8834,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8087,7 +8851,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8102,7 +8868,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8117,7 +8885,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -8132,7 +8902,9 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -8148,6 +8920,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8163,6 +8936,7 @@ define amdgpu_kernel void @local_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -8529,6 +9303,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -8543,6 +9318,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -8556,6 +9332,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -8568,6 +9345,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -8581,6 +9359,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -8594,6 +9373,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8606,6 +9386,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8618,6 +9399,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8630,6 +9412,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8642,6 +9425,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -8654,6 +9438,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -8666,6 +9451,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8678,6 +9464,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -8699,7 +9486,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -8713,7 +9502,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -8726,7 +9517,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX10-WGP-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -8738,7 +9531,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX10-CU-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -8751,7 +9546,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -8764,7 +9561,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8776,7 +9575,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8788,7 +9589,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8800,7 +9603,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX942-TGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -8812,7 +9617,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX11-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -8824,7 +9631,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX11-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -8836,7 +9645,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX12-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8848,7 +9659,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_load(
; GFX12-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -9150,6 +9963,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -9161,6 +9975,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -9171,6 +9986,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -9181,6 +9997,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -9192,6 +10009,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9202,6 +10020,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9212,6 +10031,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -9222,6 +10042,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9232,6 +10053,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -9242,6 +10064,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -9252,6 +10075,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -9290,6 +10114,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -9301,6 +10126,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -9311,6 +10137,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -9321,6 +10148,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -9332,6 +10160,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9342,6 +10171,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9352,6 +10182,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -9362,6 +10193,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9372,6 +10204,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -9382,6 +10215,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -9392,6 +10226,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -9571,6 +10406,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9582,6 +10418,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9592,6 +10429,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9602,6 +10440,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9613,6 +10452,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9623,6 +10463,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9633,6 +10474,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9643,6 +10485,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9653,6 +10496,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9663,6 +10507,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9673,6 +10518,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9683,6 +10529,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_acquire_atomicrmw:
@@ -9693,6 +10540,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -9710,6 +10558,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -9721,6 +10570,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -9731,6 +10581,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -9741,6 +10592,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -9752,6 +10604,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -9762,6 +10615,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9772,6 +10626,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -9782,6 +10637,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -9792,6 +10648,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -9802,6 +10659,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -9812,6 +10670,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -9850,7 +10709,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9861,7 +10722,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9871,7 +10734,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9881,7 +10746,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9892,7 +10759,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9902,7 +10771,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9912,7 +10783,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9922,7 +10795,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9932,7 +10807,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9942,7 +10819,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9952,7 +10831,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9963,6 +10844,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
@@ -9973,6 +10855,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -9990,7 +10873,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10001,7 +10886,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10011,7 +10898,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10021,7 +10910,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10032,7 +10923,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10042,7 +10935,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10052,7 +10947,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10062,7 +10959,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10072,7 +10971,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10082,7 +10983,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10092,7 +10995,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10103,6 +11008,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
@@ -10113,6 +11019,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -10131,6 +11038,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -10146,6 +11054,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -10160,6 +11069,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -10173,6 +11083,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -10187,6 +11098,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -10201,6 +11113,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10214,6 +11127,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10227,6 +11141,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10240,6 +11155,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10253,6 +11169,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -10266,6 +11183,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -10279,6 +11197,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -10292,6 +11211,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -10313,7 +11233,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -10328,7 +11250,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -10342,7 +11266,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -10355,7 +11281,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -10369,7 +11297,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -10383,7 +11313,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10396,7 +11328,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10409,7 +11343,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10422,7 +11358,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10435,7 +11373,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -10448,7 +11388,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -10462,6 +11404,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -10475,6 +11418,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -10496,7 +11440,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -10511,7 +11457,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -10525,7 +11473,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -10538,7 +11488,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -10552,7 +11504,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -10566,7 +11520,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10579,7 +11535,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10592,7 +11550,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10605,7 +11565,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -10618,7 +11580,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -10631,7 +11595,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -10645,6 +11611,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -10658,6 +11625,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -10849,6 +11817,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10862,6 +11831,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10874,6 +11844,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10886,6 +11857,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10899,6 +11871,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10911,6 +11884,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10923,6 +11897,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10935,6 +11910,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10947,6 +11923,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10959,6 +11936,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10971,6 +11949,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10983,6 +11962,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
@@ -10995,6 +11975,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11015,6 +11996,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -11028,6 +12010,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX7-NEXT: s_endpgm
;
@@ -11040,6 +12023,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-WGP-NEXT: s_endpgm
;
@@ -11052,6 +12036,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -11065,6 +12050,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11077,6 +12063,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11089,6 +12076,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11101,6 +12089,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11113,6 +12102,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -11125,6 +12115,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -11137,6 +12128,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -11182,7 +12174,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11195,7 +12189,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11207,7 +12203,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11219,7 +12217,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11232,7 +12232,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11244,7 +12246,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11256,7 +12260,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11268,7 +12274,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11280,7 +12288,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11292,7 +12302,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11304,7 +12316,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11317,6 +12331,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
@@ -11329,6 +12344,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11349,7 +12365,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11362,7 +12380,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11374,7 +12394,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11386,7 +12408,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11399,7 +12423,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11411,7 +12437,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11423,7 +12451,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11435,7 +12465,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11447,7 +12479,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11459,7 +12493,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11471,7 +12507,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11484,6 +12522,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
@@ -11496,6 +12535,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11517,6 +12557,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11530,6 +12571,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11542,6 +12584,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11554,6 +12597,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11567,6 +12611,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11579,6 +12624,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11591,6 +12637,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11603,6 +12650,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11615,6 +12663,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11627,6 +12676,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11639,6 +12689,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11651,6 +12702,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
@@ -11663,6 +12715,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11684,6 +12737,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11697,6 +12751,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11709,6 +12764,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11721,6 +12777,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11734,6 +12791,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11746,6 +12804,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11758,6 +12817,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11770,6 +12830,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11782,6 +12843,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11794,6 +12856,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11806,6 +12869,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11818,6 +12882,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
@@ -11830,6 +12895,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11850,7 +12916,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11863,7 +12931,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11875,7 +12945,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11887,7 +12959,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11900,7 +12974,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11912,7 +12988,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11924,7 +13002,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11936,7 +13016,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11948,7 +13030,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11960,7 +13044,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11972,7 +13058,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11985,6 +13073,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
@@ -11997,6 +13086,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12017,7 +13107,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12030,7 +13122,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12042,7 +13136,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12054,7 +13150,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12067,7 +13165,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12079,7 +13179,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12091,7 +13193,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12103,7 +13207,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12115,7 +13221,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12127,7 +13235,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12139,7 +13249,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12152,6 +13264,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
@@ -12164,6 +13277,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12184,7 +13298,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12197,7 +13313,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12209,7 +13327,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12221,7 +13341,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12234,7 +13356,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12246,7 +13370,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12258,7 +13384,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12270,7 +13398,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12282,7 +13412,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12294,7 +13426,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12306,7 +13440,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12319,6 +13455,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
@@ -12331,6 +13468,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12351,7 +13489,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12364,7 +13504,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12376,7 +13518,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12388,7 +13532,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12401,7 +13547,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12413,7 +13561,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12425,7 +13575,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12437,7 +13589,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12449,7 +13603,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12461,7 +13617,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12473,7 +13631,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12486,6 +13646,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
@@ -12498,6 +13659,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12518,7 +13680,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12531,7 +13695,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12543,7 +13709,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12555,7 +13723,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12568,7 +13738,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12580,7 +13752,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12592,7 +13766,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12604,7 +13780,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12616,7 +13794,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12628,7 +13808,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12640,7 +13822,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12653,6 +13837,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
@@ -12665,6 +13850,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12685,7 +13871,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12698,7 +13886,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12710,7 +13900,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12722,7 +13914,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12735,7 +13929,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12747,7 +13943,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12759,7 +13957,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12771,7 +13971,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12783,7 +13985,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12795,7 +13999,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12807,7 +14013,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12820,6 +14028,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
@@ -12832,6 +14041,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12852,7 +14062,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12865,7 +14077,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12877,7 +14091,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12889,7 +14105,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12902,7 +14120,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12914,7 +14134,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12926,7 +14148,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12938,7 +14162,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12950,7 +14176,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12962,7 +14190,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12974,7 +14204,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12987,6 +14219,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
@@ -12999,6 +14232,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13019,7 +14253,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13032,7 +14268,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13044,7 +14282,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13056,7 +14296,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13069,7 +14311,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13081,7 +14325,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13093,7 +14339,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13105,7 +14353,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13117,7 +14367,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13129,7 +14381,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13141,7 +14395,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13154,6 +14410,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13166,6 +14423,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13398,6 +14656,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -13415,6 +14674,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -13431,6 +14691,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -13446,6 +14707,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -13462,6 +14724,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -13478,6 +14741,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13493,6 +14757,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13508,6 +14773,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13523,6 +14789,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13538,6 +14805,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -13553,6 +14821,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -13568,6 +14837,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -13583,6 +14853,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -13608,6 +14879,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
@@ -13625,6 +14897,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
@@ -13641,6 +14914,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -13656,6 +14930,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -13672,6 +14947,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
@@ -13688,6 +14964,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -13703,6 +14980,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -13718,6 +14996,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -13733,6 +15012,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -13748,6 +15028,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -13763,6 +15044,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -13819,7 +15101,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -13836,7 +15120,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -13852,7 +15138,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -13867,7 +15155,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -13883,7 +15173,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -13899,7 +15191,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13914,7 +15208,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13929,7 +15225,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13944,7 +15242,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -13959,7 +15259,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -13974,7 +15276,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -13990,6 +15294,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14005,6 +15310,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14030,7 +15336,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14047,7 +15355,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14063,7 +15373,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14078,7 +15390,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14094,7 +15408,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14110,7 +15426,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14125,7 +15443,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14140,7 +15460,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14155,7 +15477,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14170,7 +15494,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14185,7 +15511,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14201,6 +15529,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14216,6 +15545,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14242,6 +15572,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14259,6 +15590,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14275,6 +15607,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14290,6 +15623,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14306,6 +15640,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14322,6 +15657,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14337,6 +15673,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14352,6 +15689,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14367,6 +15705,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14382,6 +15721,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14397,6 +15737,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14412,6 +15753,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14427,6 +15769,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14453,6 +15796,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14470,6 +15814,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14486,6 +15831,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14501,6 +15847,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14517,6 +15864,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14533,6 +15881,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14548,6 +15897,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14563,6 +15913,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14578,6 +15929,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14593,6 +15945,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14608,6 +15961,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14623,6 +15977,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14638,6 +15993,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14663,7 +16019,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14680,7 +16038,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14696,7 +16056,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14711,7 +16073,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14727,7 +16091,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14743,7 +16109,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14758,7 +16126,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14773,7 +16143,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14788,7 +16160,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14803,7 +16177,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14818,7 +16194,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14834,6 +16212,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14849,6 +16228,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14874,7 +16254,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14891,7 +16273,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14907,7 +16291,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14922,7 +16308,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14938,7 +16326,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14954,7 +16344,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14969,7 +16361,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14984,7 +16378,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14999,7 +16395,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15014,7 +16412,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15029,7 +16429,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15045,6 +16447,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15060,6 +16463,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15085,7 +16489,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15102,7 +16508,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15118,7 +16526,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15133,7 +16543,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15149,7 +16561,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15165,7 +16579,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15180,7 +16596,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15195,7 +16613,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15210,7 +16630,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15225,7 +16647,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15240,7 +16664,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15256,6 +16682,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15271,6 +16698,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15296,7 +16724,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15313,7 +16743,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15329,7 +16761,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15344,7 +16778,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15360,7 +16796,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15376,7 +16814,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15391,7 +16831,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15406,7 +16848,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15421,7 +16865,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15436,7 +16882,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15451,7 +16899,9 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15467,6 +16917,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15482,6 +16933,7 @@ define amdgpu_kernel void @local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15507,7 +16959,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15524,7 +16978,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15540,7 +16996,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15555,7 +17013,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15571,7 +17031,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15587,7 +17049,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15602,7 +17066,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15617,7 +17083,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15632,7 +17100,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15647,7 +17117,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15662,7 +17134,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15678,6 +17152,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15693,6 +17168,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15718,7 +17194,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15735,7 +17213,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15751,7 +17231,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15766,7 +17248,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15782,7 +17266,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15798,7 +17284,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15813,7 +17301,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15828,7 +17318,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15843,7 +17335,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15858,7 +17352,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15873,7 +17369,9 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15889,6 +17387,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15904,6 +17403,7 @@ define amdgpu_kernel void @local_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15929,7 +17429,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15946,7 +17448,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15962,7 +17466,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15977,7 +17483,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15993,7 +17501,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16009,7 +17519,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16024,7 +17536,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16039,7 +17553,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16054,7 +17570,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16069,7 +17587,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16084,7 +17604,9 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16100,6 +17622,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16115,6 +17638,7 @@ define amdgpu_kernel void @local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16140,7 +17664,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16157,7 +17683,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16173,7 +17701,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16188,7 +17718,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16204,7 +17736,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16220,7 +17754,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16235,7 +17771,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16250,7 +17788,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16265,7 +17805,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16280,7 +17822,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16295,7 +17839,9 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16311,6 +17857,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16326,6 +17873,7 @@ define amdgpu_kernel void @local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll
index 62d7f4801baf8..2e14246b8aa58 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll
@@ -444,6 +444,7 @@ define amdgpu_kernel void @local_workgroup_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -469,6 +470,7 @@ define amdgpu_kernel void @local_workgroup_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -507,7 +509,7 @@ define amdgpu_kernel void @local_workgroup_acquire_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -520,7 +522,7 @@ define amdgpu_kernel void @local_workgroup_acquire_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: ds_load_b32 v1, v0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -628,6 +630,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -655,6 +658,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_load(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -700,7 +704,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_load(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -712,9 +716,9 @@ define amdgpu_kernel void @local_workgroup_seq_cst_load(
; GFX12-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_load_b32 v1, v0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -1154,7 +1158,7 @@ define amdgpu_kernel void @local_workgroup_release_store(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(3) %out) {
@@ -1312,7 +1316,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_store(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
i32 %in, ptr addrspace(3) %out) {
@@ -1541,6 +1545,7 @@ define amdgpu_kernel void @local_workgroup_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1563,6 +1568,7 @@ define amdgpu_kernel void @local_workgroup_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1597,7 +1603,7 @@ define amdgpu_kernel void @local_workgroup_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -1609,7 +1615,7 @@ define amdgpu_kernel void @local_workgroup_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -1766,7 +1772,7 @@ define amdgpu_kernel void @local_workgroup_release_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
@@ -1863,6 +1869,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -1887,6 +1894,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -1928,7 +1936,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -1939,9 +1947,9 @@ define amdgpu_kernel void @local_workgroup_acq_rel_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -2037,6 +2045,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -2061,6 +2070,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -2102,7 +2112,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -2113,9 +2123,9 @@ define amdgpu_kernel void @local_workgroup_seq_cst_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -2218,6 +2228,7 @@ define amdgpu_kernel void @local_workgroup_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2245,6 +2256,7 @@ define amdgpu_kernel void @local_workgroup_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2286,7 +2298,7 @@ define amdgpu_kernel void @local_workgroup_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2300,7 +2312,7 @@ define amdgpu_kernel void @local_workgroup_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -2414,6 +2426,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2443,6 +2456,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2491,7 +2505,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2504,9 +2518,9 @@ define amdgpu_kernel void @local_workgroup_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -2620,6 +2634,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2649,6 +2664,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -2697,7 +2713,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -2710,9 +2726,9 @@ define amdgpu_kernel void @local_workgroup_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -2984,6 +3000,7 @@ define amdgpu_kernel void @local_workgroup_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3010,6 +3027,7 @@ define amdgpu_kernel void @local_workgroup_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3050,7 +3068,7 @@ define amdgpu_kernel void @local_workgroup_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3064,7 +3082,7 @@ define amdgpu_kernel void @local_workgroup_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3248,7 +3266,7 @@ define amdgpu_kernel void @local_workgroup_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
@@ -3360,6 +3378,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3388,6 +3407,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3435,7 +3455,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3448,9 +3468,9 @@ define amdgpu_kernel void @local_workgroup_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3561,6 +3581,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3589,6 +3610,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3636,7 +3658,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3649,9 +3671,9 @@ define amdgpu_kernel void @local_workgroup_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3754,6 +3776,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3780,6 +3803,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -3820,7 +3844,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -3834,7 +3858,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -3937,6 +3961,7 @@ define amdgpu_kernel void @local_workgroup_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -3963,6 +3988,7 @@ define amdgpu_kernel void @local_workgroup_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4003,7 +4029,7 @@ define amdgpu_kernel void @local_workgroup_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4017,7 +4043,7 @@ define amdgpu_kernel void @local_workgroup_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4128,6 +4154,7 @@ define amdgpu_kernel void @local_workgroup_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4156,6 +4183,7 @@ define amdgpu_kernel void @local_workgroup_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4203,7 +4231,7 @@ define amdgpu_kernel void @local_workgroup_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4216,9 +4244,9 @@ define amdgpu_kernel void @local_workgroup_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4329,6 +4357,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4357,6 +4386,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4404,7 +4434,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4417,9 +4447,9 @@ define amdgpu_kernel void @local_workgroup_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4530,6 +4560,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4558,6 +4589,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4605,7 +4637,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4618,9 +4650,9 @@ define amdgpu_kernel void @local_workgroup_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4731,6 +4763,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4759,6 +4792,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -4806,7 +4840,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -4819,9 +4853,9 @@ define amdgpu_kernel void @local_workgroup_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -4932,6 +4966,7 @@ define amdgpu_kernel void @local_workgroup_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -4960,6 +4995,7 @@ define amdgpu_kernel void @local_workgroup_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5007,7 +5043,7 @@ define amdgpu_kernel void @local_workgroup_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5020,9 +5056,9 @@ define amdgpu_kernel void @local_workgroup_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5133,6 +5169,7 @@ define amdgpu_kernel void @local_workgroup_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -5161,6 +5198,7 @@ define amdgpu_kernel void @local_workgroup_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5208,7 +5246,7 @@ define amdgpu_kernel void @local_workgroup_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5221,9 +5259,9 @@ define amdgpu_kernel void @local_workgroup_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5334,6 +5372,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -5362,6 +5401,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5409,7 +5449,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5422,9 +5462,9 @@ define amdgpu_kernel void @local_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5535,6 +5575,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -5563,6 +5604,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -5610,7 +5652,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: s_endpgm
;
@@ -5623,9 +5665,9 @@ define amdgpu_kernel void @local_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -5954,6 +5996,7 @@ define amdgpu_kernel void @local_workgroup_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -5985,6 +6028,7 @@ define amdgpu_kernel void @local_workgroup_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6032,7 +6076,7 @@ define amdgpu_kernel void @local_workgroup_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6048,7 +6092,7 @@ define amdgpu_kernel void @local_workgroup_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -6275,7 +6319,7 @@ define amdgpu_kernel void @local_workgroup_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
@@ -6407,6 +6451,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6440,6 +6485,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6494,7 +6540,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6509,9 +6555,9 @@ define amdgpu_kernel void @local_workgroup_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -6641,6 +6687,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6674,6 +6721,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6728,7 +6776,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6743,9 +6791,9 @@ define amdgpu_kernel void @local_workgroup_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -6867,6 +6915,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6898,6 +6947,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -6945,7 +6995,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -6961,7 +7011,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7083,6 +7133,7 @@ define amdgpu_kernel void @local_workgroup_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7114,6 +7165,7 @@ define amdgpu_kernel void @local_workgroup_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7161,7 +7213,7 @@ define amdgpu_kernel void @local_workgroup_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7177,7 +7229,7 @@ define amdgpu_kernel void @local_workgroup_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7307,6 +7359,7 @@ define amdgpu_kernel void @local_workgroup_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7340,6 +7393,7 @@ define amdgpu_kernel void @local_workgroup_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7394,7 +7448,7 @@ define amdgpu_kernel void @local_workgroup_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7409,9 +7463,9 @@ define amdgpu_kernel void @local_workgroup_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7541,6 +7595,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7574,6 +7629,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7628,7 +7684,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7643,9 +7699,9 @@ define amdgpu_kernel void @local_workgroup_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -7775,6 +7831,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7808,6 +7865,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -7862,7 +7920,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -7877,9 +7935,9 @@ define amdgpu_kernel void @local_workgroup_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8009,6 +8067,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8042,6 +8101,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8096,7 +8156,7 @@ define amdgpu_kernel void @local_workgroup_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8111,9 +8171,9 @@ define amdgpu_kernel void @local_workgroup_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8243,6 +8303,7 @@ define amdgpu_kernel void @local_workgroup_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8276,6 +8337,7 @@ define amdgpu_kernel void @local_workgroup_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8330,7 +8392,7 @@ define amdgpu_kernel void @local_workgroup_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8345,9 +8407,9 @@ define amdgpu_kernel void @local_workgroup_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8477,6 +8539,7 @@ define amdgpu_kernel void @local_workgroup_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8510,6 +8573,7 @@ define amdgpu_kernel void @local_workgroup_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8564,7 +8628,7 @@ define amdgpu_kernel void @local_workgroup_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8579,9 +8643,9 @@ define amdgpu_kernel void @local_workgroup_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8711,6 +8775,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8744,6 +8809,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8798,7 +8864,7 @@ define amdgpu_kernel void @local_workgroup_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -8813,9 +8879,9 @@ define amdgpu_kernel void @local_workgroup_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -8945,6 +9011,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -8978,6 +9045,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: buffer_inv sc0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -9032,7 +9100,7 @@ define amdgpu_kernel void @local_workgroup_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-WGP-NEXT: s_wait_dscnt 0x0
+; GFX12-WGP-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-WGP-NEXT: global_inv scope:SCOPE_SE
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -9047,9 +9115,9 @@ define amdgpu_kernel void @local_workgroup_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
-; GFX12-CU-NEXT: s_wait_dscnt 0x0
+; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x3f00
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
; GFX12-CU-NEXT: s_endpgm
@@ -9415,6 +9483,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -9429,6 +9498,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -9442,6 +9512,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -9454,6 +9525,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -9467,6 +9539,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -9480,6 +9553,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9492,6 +9566,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9504,6 +9579,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9516,6 +9592,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9528,6 +9605,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -9540,6 +9618,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -9552,6 +9631,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -9564,6 +9644,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_load(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -9585,7 +9666,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_read_b32 v1, v0
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -9599,7 +9682,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_read_b32 v1, v0
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -9612,7 +9697,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX10-WGP-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_read_b32 v1, v0
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -9624,7 +9711,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX10-CU-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_read_b32 v1, v0
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -9637,7 +9726,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_read_b32 v1, v0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -9650,7 +9741,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX90A-NOTTGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9662,7 +9755,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX90A-TGSPLIT-NEXT: s_load_dword s4, s[8:9], 0x4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9674,7 +9769,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX942-NOTTGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9686,7 +9783,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX942-TGSPLIT-NEXT: s_load_dword s0, s[4:5], 0x4
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_read_b32 v1, v0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -9698,7 +9797,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX11-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_load_b32 v1, v0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -9710,7 +9811,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX11-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_load_b32 v1, v0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -9722,7 +9825,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX12-WGP-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: ds_load_b32 v1, v0
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -9734,7 +9839,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_load(
; GFX12-CU-NEXT: s_load_b32 s0, s[4:5], 0x4
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: ds_load_b32 v1, v0
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -10036,6 +10143,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -10047,6 +10155,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -10057,6 +10166,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10067,6 +10177,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -10078,6 +10189,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10088,6 +10200,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10098,6 +10211,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -10108,6 +10222,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10118,6 +10233,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -10128,6 +10244,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -10138,6 +10255,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -10176,6 +10294,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_store(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_write_b32 v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -10187,6 +10306,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_store(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_write_b32 v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -10197,6 +10317,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_store(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10207,6 +10328,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_store(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -10218,6 +10340,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_store(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_write_b32 v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10228,6 +10351,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_store(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10238,6 +10362,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_store(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -10248,6 +10373,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_store(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10258,6 +10384,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_store(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -10268,6 +10395,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_store(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -10278,6 +10406,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_store(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -10457,6 +10586,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10468,6 +10598,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10478,6 +10609,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10488,6 +10620,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10499,6 +10632,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10509,6 +10643,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10519,6 +10654,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10529,6 +10665,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10539,6 +10676,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10549,6 +10687,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10559,6 +10698,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10569,6 +10709,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_acquire_atomicrmw:
@@ -10579,6 +10720,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -10596,6 +10738,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX6-NEXT: s_endpgm
;
@@ -10607,6 +10750,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX7-NEXT: s_endpgm
;
@@ -10617,6 +10761,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-WGP-NEXT: s_endpgm
;
@@ -10627,6 +10772,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX10-CU-NEXT: s_endpgm
;
@@ -10638,6 +10784,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -10648,6 +10795,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10658,6 +10806,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -10668,6 +10817,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -10678,6 +10828,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -10688,6 +10839,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-WGP-NEXT: s_endpgm
;
@@ -10698,6 +10850,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
; GFX11-CU-NEXT: s_endpgm
;
@@ -10736,7 +10889,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10747,7 +10902,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10757,7 +10914,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10767,7 +10926,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10778,7 +10939,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10788,7 +10951,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10798,7 +10963,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10808,7 +10975,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10818,7 +10987,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10828,7 +10999,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10838,7 +11011,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10849,6 +11024,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
@@ -10859,6 +11035,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -10876,7 +11053,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s5
; GFX6-NEXT: v_mov_b32_e32 v1, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10887,7 +11066,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s5
; GFX7-NEXT: v_mov_b32_e32 v1, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10897,7 +11078,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10907,7 +11090,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10918,7 +11103,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10928,7 +11115,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10938,7 +11127,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10948,7 +11139,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10958,7 +11151,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10968,7 +11163,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10978,7 +11175,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10989,6 +11188,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s0
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
@@ -10999,6 +11199,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s0
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v0, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in) {
entry:
@@ -11017,6 +11218,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -11032,6 +11234,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -11046,6 +11249,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -11059,6 +11263,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -11073,6 +11278,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -11087,6 +11293,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11100,6 +11307,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11113,6 +11321,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11126,6 +11335,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11139,6 +11349,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -11152,6 +11363,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -11165,6 +11377,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -11178,6 +11391,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -11199,7 +11413,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -11214,7 +11430,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -11228,7 +11446,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -11241,7 +11461,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -11255,7 +11477,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -11269,7 +11493,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11282,7 +11508,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11295,7 +11523,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11308,7 +11538,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11321,7 +11553,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -11334,7 +11568,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -11348,6 +11584,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -11361,6 +11598,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -11382,7 +11620,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -11397,7 +11637,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -11411,7 +11653,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -11424,7 +11668,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -11438,7 +11684,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -11452,7 +11700,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11465,7 +11715,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11478,7 +11730,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11491,7 +11745,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -11504,7 +11760,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -11517,7 +11775,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -11531,6 +11791,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -11544,6 +11805,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: ds_storexchg_rtn_b32 v1, v0, v1
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -11735,6 +11997,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11748,6 +12011,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11760,6 +12024,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11772,6 +12037,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11785,6 +12051,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11797,6 +12064,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11809,6 +12077,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11821,6 +12090,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11833,6 +12103,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11845,6 +12116,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11857,6 +12129,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11869,6 +12142,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
@@ -11881,6 +12155,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -11901,6 +12176,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX6-NEXT: s_endpgm
;
@@ -11914,6 +12190,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX7-NEXT: s_endpgm
;
@@ -11926,6 +12203,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-WGP-NEXT: s_endpgm
;
@@ -11938,6 +12216,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX10-CU-NEXT: s_endpgm
;
@@ -11951,6 +12230,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_endpgm
;
@@ -11963,6 +12243,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11975,6 +12256,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
@@ -11987,6 +12269,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
@@ -11999,6 +12282,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: s_endpgm
;
@@ -12011,6 +12295,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-WGP-NEXT: s_endpgm
;
@@ -12023,6 +12308,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
; GFX11-CU-NEXT: s_endpgm
;
@@ -12068,7 +12354,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12081,7 +12369,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12093,7 +12383,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12105,7 +12397,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12118,7 +12412,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12130,7 +12426,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12142,7 +12440,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12154,7 +12454,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12166,7 +12468,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12178,7 +12482,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12190,7 +12496,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12203,6 +12511,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
@@ -12215,6 +12524,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12235,7 +12545,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12248,7 +12560,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12260,7 +12574,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12272,7 +12588,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12285,7 +12603,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12297,7 +12617,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12309,7 +12631,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12321,7 +12645,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12333,7 +12659,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12345,7 +12673,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12357,7 +12687,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12370,6 +12702,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
@@ -12382,6 +12715,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12403,6 +12737,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12416,6 +12751,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12428,6 +12764,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12440,6 +12777,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12453,6 +12791,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12465,6 +12804,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12477,6 +12817,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12489,6 +12830,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12501,6 +12843,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12513,6 +12856,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12525,6 +12869,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12537,6 +12882,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
@@ -12549,6 +12895,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12570,6 +12917,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12583,6 +12931,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12595,6 +12944,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12607,6 +12957,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12620,6 +12971,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12632,6 +12984,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12644,6 +12997,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12656,6 +13010,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12668,6 +13023,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12680,6 +13036,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12692,6 +13049,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12704,6 +13062,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
@@ -12716,6 +13075,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12736,7 +13096,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12749,7 +13111,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12761,7 +13125,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12773,7 +13139,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12786,7 +13154,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12798,7 +13168,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12810,7 +13182,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12822,7 +13196,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12834,7 +13210,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12846,7 +13224,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12858,7 +13238,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12871,6 +13253,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
@@ -12883,6 +13266,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -12903,7 +13287,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -12916,7 +13302,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -12928,7 +13316,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -12940,7 +13330,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -12953,7 +13345,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -12965,7 +13359,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -12977,7 +13373,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -12989,7 +13387,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -13001,7 +13401,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -13013,7 +13415,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -13025,7 +13429,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -13038,6 +13444,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
@@ -13050,6 +13457,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13070,7 +13478,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13083,7 +13493,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13095,7 +13507,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13107,7 +13521,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13120,7 +13536,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13132,7 +13550,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13144,7 +13564,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13156,7 +13578,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13168,7 +13592,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13180,7 +13606,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13192,7 +13620,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13205,6 +13635,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
@@ -13217,6 +13648,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13237,7 +13669,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13250,7 +13684,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13262,7 +13698,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13274,7 +13712,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13287,7 +13727,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13299,7 +13741,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13311,7 +13755,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13323,7 +13769,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13335,7 +13783,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13347,7 +13797,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13359,7 +13811,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13372,6 +13826,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
@@ -13384,6 +13839,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13404,7 +13860,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13417,7 +13875,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13429,7 +13889,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13441,7 +13903,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13454,7 +13918,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13466,7 +13932,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13478,7 +13946,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13490,7 +13960,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13502,7 +13974,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13514,7 +13988,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13526,7 +14002,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13539,6 +14017,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
@@ -13551,6 +14030,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13571,7 +14051,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13584,7 +14066,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13596,7 +14080,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13608,7 +14094,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13621,7 +14109,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13633,7 +14123,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13645,7 +14137,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13657,7 +14151,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13669,7 +14165,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13681,7 +14179,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13693,7 +14193,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13706,6 +14208,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
@@ -13718,6 +14221,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13738,7 +14242,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13751,7 +14257,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13763,7 +14271,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13775,7 +14285,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13788,7 +14300,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13800,7 +14314,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13812,7 +14328,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13824,7 +14342,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13836,7 +14356,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13848,7 +14370,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13860,7 +14384,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13873,6 +14399,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
@@ -13885,6 +14412,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -13905,7 +14433,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s6
; GFX6-NEXT: v_mov_b32_e32 v1, s5
; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_endpgm
;
; GFX7-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13918,7 +14448,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s4
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_endpgm
;
; GFX10-WGP-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13930,7 +14462,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_endpgm
;
; GFX10-CU-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13942,7 +14476,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s4
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_endpgm
;
; SKIP-CACHE-INV-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13955,7 +14491,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s0
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_endpgm
;
; GFX90A-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13967,7 +14505,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX90A-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13979,7 +14519,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s5
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: s_endpgm
;
; GFX942-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -13991,7 +14533,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: s_endpgm
;
; GFX942-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14003,7 +14547,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: s_endpgm
;
; GFX11-WGP-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14015,7 +14561,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_endpgm
;
; GFX11-CU-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14027,7 +14575,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_endpgm
;
; GFX12-WGP-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14040,6 +14590,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s1
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s0
; GFX12-WGP-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
@@ -14052,6 +14603,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s1
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s0
; GFX12-CU-NEXT: ds_cmpstore_b32 v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(3) %out, i32 %in, i32 %old) {
entry:
@@ -14284,6 +14836,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14301,6 +14854,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14317,6 +14871,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14332,6 +14887,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14348,6 +14904,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14364,6 +14921,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14379,6 +14937,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14394,6 +14953,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14409,6 +14969,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14424,6 +14985,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14439,6 +15001,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14454,6 +15017,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14469,6 +15033,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14494,6 +15059,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
@@ -14511,6 +15077,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
@@ -14527,6 +15094,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -14542,6 +15110,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -14558,6 +15127,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
@@ -14574,6 +15144,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14589,6 +15160,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14604,6 +15176,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14619,6 +15192,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
@@ -14634,6 +15208,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
@@ -14649,6 +15224,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
@@ -14705,7 +15281,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14722,7 +15300,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14738,7 +15318,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14753,7 +15335,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14769,7 +15353,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14785,7 +15371,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14800,7 +15388,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14815,7 +15405,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14830,7 +15422,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -14845,7 +15439,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -14860,7 +15456,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -14876,6 +15474,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -14891,6 +15490,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -14916,7 +15516,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -14933,7 +15535,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -14949,7 +15553,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -14964,7 +15570,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -14980,7 +15588,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -14996,7 +15606,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15011,7 +15623,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15026,7 +15640,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15041,7 +15657,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15056,7 +15674,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15071,7 +15691,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15087,6 +15709,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15102,6 +15725,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15128,6 +15752,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15145,6 +15770,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15161,6 +15787,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15176,6 +15803,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15192,6 +15820,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15208,6 +15837,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15223,6 +15853,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15238,6 +15869,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15253,6 +15885,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15268,6 +15901,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15283,6 +15917,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15298,6 +15933,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15313,6 +15949,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15339,6 +15976,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15356,6 +15994,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15372,6 +16011,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15387,6 +16027,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15403,6 +16044,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15419,6 +16061,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15434,6 +16077,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15449,6 +16093,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15464,6 +16109,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15479,6 +16125,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15494,6 +16141,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15509,6 +16157,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15524,6 +16173,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15549,7 +16199,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15566,7 +16218,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15582,7 +16236,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15597,7 +16253,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15613,7 +16271,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15629,7 +16289,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15644,7 +16306,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15659,7 +16323,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15674,7 +16340,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15689,7 +16357,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15704,7 +16374,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15720,6 +16392,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15735,6 +16408,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15760,7 +16434,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15777,7 +16453,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -15793,7 +16471,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -15808,7 +16488,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -15824,7 +16506,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -15840,7 +16524,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15855,7 +16541,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15870,7 +16558,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15885,7 +16575,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -15900,7 +16592,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -15915,7 +16609,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -15931,6 +16627,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -15946,6 +16643,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -15971,7 +16669,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -15988,7 +16688,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16004,7 +16706,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16019,7 +16723,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16035,7 +16741,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16051,7 +16759,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16066,7 +16776,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16081,7 +16793,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16096,7 +16810,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16111,7 +16827,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16126,7 +16844,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16142,6 +16862,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16157,6 +16878,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16182,7 +16904,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16199,7 +16923,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16215,7 +16941,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16230,7 +16958,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16246,7 +16976,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16262,7 +16994,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16277,7 +17011,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16292,7 +17028,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16307,7 +17045,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16322,7 +17062,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16337,7 +17079,9 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16353,6 +17097,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16368,6 +17113,7 @@ define amdgpu_kernel void @local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16393,7 +17139,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16410,7 +17158,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16426,7 +17176,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16441,7 +17193,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16457,7 +17211,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16473,7 +17229,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16488,7 +17246,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16503,7 +17263,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16518,7 +17280,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16533,7 +17297,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16548,7 +17314,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16564,6 +17332,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16579,6 +17348,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16604,7 +17374,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16621,7 +17393,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16637,7 +17411,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16652,7 +17428,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16668,7 +17446,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16684,7 +17464,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16699,7 +17481,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16714,7 +17498,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16729,7 +17515,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16744,7 +17532,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16759,7 +17549,9 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16775,6 +17567,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -16790,6 +17583,7 @@ define amdgpu_kernel void @local_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -16815,7 +17609,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -16832,7 +17628,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -16848,7 +17646,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -16863,7 +17663,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -16879,7 +17681,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -16895,7 +17699,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16910,7 +17716,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16925,7 +17733,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16940,7 +17750,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -16955,7 +17767,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -16970,7 +17784,9 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -16986,6 +17802,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -17001,6 +17818,7 @@ define amdgpu_kernel void @local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
@@ -17026,7 +17844,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: v_mov_b32_e32 v1, s6
; GFX6-NEXT: v_mov_b32_e32 v2, s5
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX6-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX6-NEXT: s_mov_b32 m0, -1
; GFX6-NEXT: v_mov_b32_e32 v0, s4
; GFX6-NEXT: s_waitcnt lgkmcnt(0)
@@ -17043,7 +17863,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mov_b32_e32 v2, s5
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX7-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; GFX7-NEXT: s_mov_b32 m0, -1
; GFX7-NEXT: v_mov_b32_e32 v0, s4
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
@@ -17059,7 +17881,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s6
; GFX10-WGP-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: ds_write_b32 v0, v1
@@ -17074,7 +17898,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s6
; GFX10-CU-NEXT: v_mov_b32_e32 v2, s5
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: ds_write_b32 v0, v1
@@ -17090,7 +17916,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v1, s2
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v2, s1
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(15) expcnt(7) lgkmcnt(15)
; SKIP-CACHE-INV-NEXT: s_mov_b32 m0, -1
; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, s0
; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
@@ -17106,7 +17934,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17121,7 +17951,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s6
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s5
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17136,7 +17968,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17151,7 +17985,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX942-TGSPLIT-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(15)
; GFX942-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX942-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX942-TGSPLIT-NEXT: ds_write_b32 v0, v1
@@ -17166,7 +18002,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: ds_store_b32 v0, v1
@@ -17181,7 +18019,9 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: ds_store_b32 v0, v1
@@ -17197,6 +18037,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-WGP-NEXT: v_mov_b32_e32 v1, s2
; GFX12-WGP-NEXT: v_mov_b32_e32 v2, s1
; GFX12-WGP-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: ds_store_b32 v0, v1
@@ -17212,6 +18053,7 @@ define amdgpu_kernel void @local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: v_mov_b32_e32 v1, s2
; GFX12-CU-NEXT: v_mov_b32_e32 v2, s1
; GFX12-CU-NEXT: ds_cmpstore_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: ds_store_b32 v0, v1
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local.mir b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local.mir
index 56dd95e373dc6..0101ef8f8455a 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-local.mir
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-local.mir
@@ -70,6 +70,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 0, implicit $m0, implicit $exec :: (volatile load syncscope("singlethread-one-as") acquire (s32) from `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -97,7 +98,9 @@ body: |
; GCN-NEXT: $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed $sgpr0_sgpr1, 44, 0 :: (dereferenceable invariant load (s64) from `ptr addrspace(4) poison`, align 4, addrspace 4)
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 0, implicit $m0, implicit $exec :: (volatile load syncscope("singlethread-one-as") seq_cst (s32) from `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -182,6 +185,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 0, implicit $m0, implicit $exec :: (volatile load syncscope("wavefront-one-as") acquire (s32) from `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -209,7 +213,9 @@ body: |
; GCN-NEXT: $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed $sgpr0_sgpr1, 44, 0 :: (dereferenceable invariant load (s64) from `ptr addrspace(4) poison`, align 4, addrspace 4)
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 0, implicit $m0, implicit $exec :: (volatile load syncscope("wavefront-one-as") seq_cst (s32) from `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -294,6 +300,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 0, implicit $m0, implicit $exec :: (volatile load syncscope("workgroup-one-as") acquire (s32) from `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -321,7 +328,9 @@ body: |
; GCN-NEXT: $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed $sgpr0_sgpr1, 44, 0 :: (dereferenceable invariant load (s64) from `ptr addrspace(4) poison`, align 4, addrspace 4)
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 0, implicit $m0, implicit $exec :: (volatile load syncscope("workgroup-one-as") seq_cst (s32) from `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -406,6 +415,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 0, implicit $m0, implicit $exec :: (volatile load syncscope("agent-one-as") acquire (s32) from `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -433,7 +443,9 @@ body: |
; GCN-NEXT: $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed $sgpr0_sgpr1, 44, 0 :: (dereferenceable invariant load (s64) from `ptr addrspace(4) poison`, align 4, addrspace 4)
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 0, implicit $m0, implicit $exec :: (volatile load syncscope("agent-one-as") seq_cst (s32) from `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -518,6 +530,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 0, implicit $m0, implicit $exec :: (volatile load syncscope("one-as") acquire (s32) from `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -545,7 +558,9 @@ body: |
; GCN-NEXT: $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed $sgpr0_sgpr1, 44, 0 :: (dereferenceable invariant load (s64) from `ptr addrspace(4) poison`, align 4, addrspace 4)
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 0, implicit $m0, implicit $exec :: (volatile load syncscope("one-as") seq_cst (s32) from `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -622,6 +637,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") release (s32) into `ptr addrspace(3) poison`, addrspace 3)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -646,6 +662,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") seq_cst (s32) into `ptr addrspace(3) poison`, addrspace 3)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -718,6 +735,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("wavefront-one-as") release (s32) into `ptr addrspace(3) poison`, addrspace 3)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -742,6 +760,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("wavefront-one-as") seq_cst (s32) into `ptr addrspace(3) poison`, addrspace 3)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -814,6 +833,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("workgroup-one-as") release (s32) into `ptr addrspace(3) poison`, addrspace 3)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -838,6 +858,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("workgroup-one-as") seq_cst (s32) into `ptr addrspace(3) poison`, addrspace 3)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -910,6 +931,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("agent-one-as") release (s32) into `ptr addrspace(3) poison`, addrspace 3)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -934,6 +956,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("agent-one-as") seq_cst (s32) into `ptr addrspace(3) poison`, addrspace 3)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -1006,6 +1029,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("one-as") release (s32) into `ptr addrspace(3) poison`, addrspace 3)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -1030,6 +1054,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("one-as") seq_cst (s32) into `ptr addrspace(3) poison`, addrspace 3)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -1103,6 +1128,7 @@ body: |
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr2 = DS_WRXCHG_RTN_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") acquire (s32) into `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
$sgpr0 = S_LOAD_DWORD_IMM killed $sgpr0_sgpr1, 40, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, align 8, addrspace 4)
@@ -1126,6 +1152,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr2 = DS_WRXCHG_RTN_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") release (s32) into `ptr addrspace(3) poison`, addrspace 3)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -1150,7 +1177,9 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr2 = DS_WRXCHG_RTN_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") acq_rel (s32) into `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
$sgpr0 = S_LOAD_DWORD_IMM killed $sgpr0_sgpr1, 40, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, align 8, addrspace 4)
@@ -1174,7 +1203,9 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr2 = DS_WRXCHG_RTN_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 0, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") seq_cst (s32) into `ptr addrspace(3) poison`, addrspace 3)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
$sgpr0 = S_LOAD_DWORD_IMM killed $sgpr0_sgpr1, 40, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, align 8, addrspace 4)
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll
index df4193969f8a0..4be8892ca9d7c 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll
@@ -396,6 +396,7 @@ define amdgpu_kernel void @private_volatile_store_0(
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s5
; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s4
; GFX10-WGP-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: s_endpgm
;
@@ -411,6 +412,7 @@ define amdgpu_kernel void @private_volatile_store_0(
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s5
; GFX10-CU-NEXT: v_mov_b32_e32 v1, s4
; GFX10-CU-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: s_endpgm
;
@@ -442,6 +444,7 @@ define amdgpu_kernel void @private_volatile_store_0(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s1
; GFX11-WGP-NEXT: scratch_store_b32 off, v0, s0 dlc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: s_endpgm
;
@@ -454,6 +457,7 @@ define amdgpu_kernel void @private_volatile_store_0(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s1
; GFX11-CU-NEXT: scratch_store_b32 off, v0, s0 dlc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: s_endpgm
;
@@ -471,6 +475,7 @@ define amdgpu_kernel void @private_volatile_store_0(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: scratch_store_b32 off, v0, s0 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_endpgm
;
@@ -488,6 +493,7 @@ define amdgpu_kernel void @private_volatile_store_0(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %in, ptr addrspace(5) %out) {
@@ -549,6 +555,7 @@ define amdgpu_kernel void @private_volatile_store_1(
; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s4
; GFX10-WGP-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
+; GFX10-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-WGP-NEXT: s_endpgm
;
@@ -565,6 +572,7 @@ define amdgpu_kernel void @private_volatile_store_1(
; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-CU-NEXT: v_mov_b32_e32 v0, s4
; GFX10-CU-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
+; GFX10-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-CU-NEXT: s_endpgm
;
@@ -602,6 +610,7 @@ define amdgpu_kernel void @private_volatile_store_1(
; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s0
; GFX11-WGP-NEXT: scratch_store_b32 v1, v0, off dlc
+; GFX11-WGP-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-WGP-NEXT: s_endpgm
;
@@ -618,6 +627,7 @@ define amdgpu_kernel void @private_volatile_store_1(
; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-CU-NEXT: v_mov_b32_e32 v0, s0
; GFX11-CU-NEXT: scratch_store_b32 v1, v0, off dlc
+; GFX11-CU-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-CU-NEXT: s_endpgm
;
@@ -640,6 +650,7 @@ define amdgpu_kernel void @private_volatile_store_1(
; GFX12-WGP-NEXT: s_wait_kmcnt 0x0
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: scratch_store_b32 v1, v0, s0 scope:SCOPE_SYS
+; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f
; GFX12-WGP-NEXT: s_wait_storecnt 0x0
; GFX12-WGP-NEXT: s_endpgm
;
@@ -662,6 +673,7 @@ define amdgpu_kernel void @private_volatile_store_1(
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: scratch_store_b32 v1, v0, s0 scope:SCOPE_SYS
+; GFX12-CU-NEXT: s_wait_loadcnt 0x3f
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_endpgm
ptr addrspace(1) %in, ptr addrspace(5) %out) {
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-region.mir b/llvm/test/CodeGen/AMDGPU/memory-legalizer-region.mir
index 36a244f6250db..2f09fcb0b6b2c 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-region.mir
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-region.mir
@@ -70,6 +70,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 1, implicit $m0, implicit $exec :: (volatile load syncscope("singlethread-one-as") acquire (s32) from `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -97,7 +98,9 @@ body: |
; GCN-NEXT: $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed $sgpr0_sgpr1, 44, 0 :: (dereferenceable invariant load (s64) from `ptr addrspace(4) poison`, align 4, addrspace 4)
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 1, implicit $m0, implicit $exec :: (volatile load syncscope("singlethread-one-as") seq_cst (s32) from `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -182,6 +185,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 1, implicit $m0, implicit $exec :: (volatile load syncscope("wavefront-one-as") acquire (s32) from `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -209,7 +213,9 @@ body: |
; GCN-NEXT: $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed $sgpr0_sgpr1, 44, 0 :: (dereferenceable invariant load (s64) from `ptr addrspace(4) poison`, align 4, addrspace 4)
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 1, implicit $m0, implicit $exec :: (volatile load syncscope("wavefront-one-as") seq_cst (s32) from `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -294,6 +300,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 1, implicit $m0, implicit $exec :: (volatile load syncscope("workgroup-one-as") acquire (s32) from `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -321,7 +328,9 @@ body: |
; GCN-NEXT: $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed $sgpr0_sgpr1, 44, 0 :: (dereferenceable invariant load (s64) from `ptr addrspace(4) poison`, align 4, addrspace 4)
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 1, implicit $m0, implicit $exec :: (volatile load syncscope("workgroup-one-as") seq_cst (s32) from `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -406,6 +415,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 1, implicit $m0, implicit $exec :: (volatile load syncscope("agent-one-as") acquire (s32) from `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -433,7 +443,9 @@ body: |
; GCN-NEXT: $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed $sgpr0_sgpr1, 44, 0 :: (dereferenceable invariant load (s64) from `ptr addrspace(4) poison`, align 4, addrspace 4)
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 1, implicit $m0, implicit $exec :: (volatile load syncscope("agent-one-as") seq_cst (s32) from `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -518,6 +530,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 1, implicit $m0, implicit $exec :: (volatile load syncscope("one-as") acquire (s32) from `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -545,7 +558,9 @@ body: |
; GCN-NEXT: $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed $sgpr0_sgpr1, 44, 0 :: (dereferenceable invariant load (s64) from `ptr addrspace(4) poison`, align 4, addrspace 4)
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: renamable $vgpr2 = DS_READ_B32 killed renamable $vgpr0, 0, 1, implicit $m0, implicit $exec :: (volatile load syncscope("one-as") seq_cst (s32) from `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $sgpr0_sgpr1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $sgpr0_sgpr1, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORD killed renamable $vgpr0_vgpr1, killed renamable $vgpr2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr poison`)
@@ -622,6 +637,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") release (s32) into `ptr addrspace(2) poison`, addrspace 2)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -646,6 +662,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") seq_cst (s32) into `ptr addrspace(2) poison`, addrspace 2)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -718,6 +735,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("wavefront-one-as") release (s32) into `ptr addrspace(2) poison`, addrspace 2)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -742,6 +760,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("wavefront-one-as") seq_cst (s32) into `ptr addrspace(2) poison`, addrspace 2)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -814,6 +833,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("workgroup-one-as") release (s32) into `ptr addrspace(2) poison`, addrspace 2)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -838,6 +858,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("workgroup-one-as") seq_cst (s32) into `ptr addrspace(2) poison`, addrspace 2)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -910,6 +931,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("agent-one-as") release (s32) into `ptr addrspace(2) poison`, addrspace 2)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -934,6 +956,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("agent-one-as") seq_cst (s32) into `ptr addrspace(2) poison`, addrspace 2)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -1006,6 +1029,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("one-as") release (s32) into `ptr addrspace(2) poison`, addrspace 2)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -1030,6 +1054,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: DS_WRITE_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("one-as") seq_cst (s32) into `ptr addrspace(2) poison`, addrspace 2)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -1103,6 +1128,7 @@ body: |
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr2 = DS_WRXCHG_RTN_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") acquire (s32) into `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
$sgpr0 = S_LOAD_DWORD_IMM killed $sgpr0_sgpr1, 40, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, align 8, addrspace 4)
@@ -1126,6 +1152,7 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr2 = DS_WRXCHG_RTN_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") release (s32) into `ptr addrspace(2) poison`, addrspace 2)
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
@@ -1150,7 +1177,9 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr2 = DS_WRXCHG_RTN_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") acq_rel (s32) into `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
$sgpr0 = S_LOAD_DWORD_IMM killed $sgpr0_sgpr1, 40, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, align 8, addrspace 4)
@@ -1174,7 +1203,9 @@ body: |
; GCN-NEXT: $m0 = S_MOV_B32 -1
; GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $sgpr2, implicit $exec, implicit $exec
; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: $vgpr2 = DS_WRXCHG_RTN_B32 killed renamable $vgpr0, killed renamable $vgpr1, 0, 1, implicit $m0, implicit $exec :: (volatile store syncscope("singlethread-one-as") seq_cst (s32) into `ptr addrspace(2) poison`, addrspace 2)
+ ; GCN-NEXT: S_WAITCNT_soft 3967
; GCN-NEXT: S_ENDPGM 0
$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
$sgpr0 = S_LOAD_DWORD_IMM killed $sgpr0_sgpr1, 40, 0 :: (dereferenceable invariant load (s32) from `ptr addrspace(4) poison`, align 8, addrspace 4)
diff --git a/llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands-non-ptr-intrinsics.ll b/llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands-non-ptr-intrinsics.ll
index 8426224d9dd50..51e9d67b0b77f 100644
--- a/llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands-non-ptr-intrinsics.ll
+++ b/llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands-non-ptr-intrinsics.ll
@@ -335,8 +335,8 @@ define void @mubuf_vgpr_adjacent_in_block(<4 x i32> %i, <4 x i32> %j, i32 %c, pt
; GFX1010_W32-NEXT: s_mov_b32 exec_lo, s5
; GFX1010_W32-NEXT: s_waitcnt vmcnt(1)
; GFX1010_W32-NEXT: global_store_dword v[9:10], v13, off
-; GFX1010_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W32-NEXT: s_waitcnt vmcnt(0)
+; GFX1010_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W32-NEXT: global_store_dword v[11:12], v0, off
; GFX1010_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W32-NEXT: s_setpc_b64 s[30:31]
@@ -381,8 +381,8 @@ define void @mubuf_vgpr_adjacent_in_block(<4 x i32> %i, <4 x i32> %j, i32 %c, pt
; GFX1010_W64-NEXT: s_mov_b64 exec, s[6:7]
; GFX1010_W64-NEXT: s_waitcnt vmcnt(1)
; GFX1010_W64-NEXT: global_store_dword v[9:10], v13, off
-; GFX1010_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W64-NEXT: s_waitcnt vmcnt(0)
+; GFX1010_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W64-NEXT: global_store_dword v[11:12], v0, off
; GFX1010_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W64-NEXT: s_setpc_b64 s[30:31]
@@ -430,8 +430,8 @@ define void @mubuf_vgpr_adjacent_in_block(<4 x i32> %i, <4 x i32> %j, i32 %c, pt
; GFX1100_W32-NEXT: s_mov_b32 exec_lo, s1
; GFX1100_W32-NEXT: s_waitcnt vmcnt(1)
; GFX1100_W32-NEXT: global_store_b32 v[9:10], v13, off dlc
-; GFX1100_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W32-NEXT: s_waitcnt vmcnt(0)
+; GFX1100_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W32-NEXT: global_store_b32 v[11:12], v0, off dlc
; GFX1100_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W32-NEXT: s_setpc_b64 s[30:31]
@@ -479,8 +479,8 @@ define void @mubuf_vgpr_adjacent_in_block(<4 x i32> %i, <4 x i32> %j, i32 %c, pt
; GFX1100_W64-NEXT: s_mov_b64 exec, s[2:3]
; GFX1100_W64-NEXT: s_waitcnt vmcnt(1)
; GFX1100_W64-NEXT: global_store_b32 v[9:10], v13, off dlc
-; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W64-NEXT: s_waitcnt vmcnt(0)
+; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W64-NEXT: global_store_b32 v[11:12], v0, off dlc
; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]
diff --git a/llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll b/llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll
index 1480743e435ff..0c77f290df1cd 100644
--- a/llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll
+++ b/llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll
@@ -346,8 +346,8 @@ define void @mubuf_vgpr_adjacent_in_block(ptr addrspace(8) %i, ptr addrspace(8)
; GFX1010_W32-NEXT: s_mov_b32 exec_lo, s5
; GFX1010_W32-NEXT: s_waitcnt vmcnt(1)
; GFX1010_W32-NEXT: global_store_dword v[9:10], v13, off
-; GFX1010_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W32-NEXT: s_waitcnt vmcnt(0)
+; GFX1010_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W32-NEXT: global_store_dword v[11:12], v0, off
; GFX1010_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W32-NEXT: s_setpc_b64 s[30:31]
@@ -392,8 +392,8 @@ define void @mubuf_vgpr_adjacent_in_block(ptr addrspace(8) %i, ptr addrspace(8)
; GFX1010_W64-NEXT: s_mov_b64 exec, s[6:7]
; GFX1010_W64-NEXT: s_waitcnt vmcnt(1)
; GFX1010_W64-NEXT: global_store_dword v[9:10], v13, off
-; GFX1010_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W64-NEXT: s_waitcnt vmcnt(0)
+; GFX1010_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W64-NEXT: global_store_dword v[11:12], v0, off
; GFX1010_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1010_W64-NEXT: s_setpc_b64 s[30:31]
@@ -441,8 +441,8 @@ define void @mubuf_vgpr_adjacent_in_block(ptr addrspace(8) %i, ptr addrspace(8)
; GFX1100_W32-NEXT: s_mov_b32 exec_lo, s1
; GFX1100_W32-NEXT: s_waitcnt vmcnt(1)
; GFX1100_W32-NEXT: global_store_b32 v[9:10], v13, off dlc
-; GFX1100_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W32-NEXT: s_waitcnt vmcnt(0)
+; GFX1100_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W32-NEXT: global_store_b32 v[11:12], v0, off dlc
; GFX1100_W32-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W32-NEXT: s_setpc_b64 s[30:31]
@@ -490,8 +490,8 @@ define void @mubuf_vgpr_adjacent_in_block(ptr addrspace(8) %i, ptr addrspace(8)
; GFX1100_W64-NEXT: s_mov_b64 exec, s[2:3]
; GFX1100_W64-NEXT: s_waitcnt vmcnt(1)
; GFX1100_W64-NEXT: global_store_b32 v[9:10], v13, off dlc
-; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W64-NEXT: s_waitcnt vmcnt(0)
+; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W64-NEXT: global_store_b32 v[11:12], v0, off dlc
; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]
diff --git a/llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll b/llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll
index d2394bab82c77..df6e1ec50fcd5 100644
--- a/llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll
+++ b/llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll
@@ -944,6 +944,7 @@ define amdgpu_kernel void @kernel_stacksave_stackrestore_call_with_stack_objects
; WAVE32-O0-NEXT: v_writelane_b32 v32, s0, 1
; WAVE32-O0-NEXT: v_mov_b32_e32 v3, 42
; WAVE32-O0-NEXT: buffer_store_dword v3, off, s[20:23], 0
+; WAVE32-O0-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; WAVE32-O0-NEXT: s_waitcnt_vscnt null, 0x0
; WAVE32-O0-NEXT: s_mov_b64 s[0:1], s[20:21]
; WAVE32-O0-NEXT: s_mov_b64 s[2:3], s[22:23]
@@ -1054,6 +1055,7 @@ define amdgpu_kernel void @kernel_stacksave_stackrestore_call_with_stack_objects
; WAVE64-O0-NEXT: v_writelane_b32 v32, s0, 1
; WAVE64-O0-NEXT: v_mov_b32_e32 v3, 42
; WAVE64-O0-NEXT: buffer_store_dword v3, off, s[24:27], 0
+; WAVE64-O0-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; WAVE64-O0-NEXT: s_waitcnt_vscnt null, 0x0
; WAVE64-O0-NEXT: s_mov_b64 s[0:1], s[24:25]
; WAVE64-O0-NEXT: s_mov_b64 s[2:3], s[26:27]
@@ -1165,6 +1167,7 @@ define amdgpu_kernel void @kernel_stacksave_stackrestore_call_with_stack_objects
; WAVE32-WWM-PREALLOC-NEXT: v_writelane_b32 v32, s0, 1
; WAVE32-WWM-PREALLOC-NEXT: v_mov_b32_e32 v3, 42
; WAVE32-WWM-PREALLOC-NEXT: buffer_store_dword v3, off, s[20:23], 0
+; WAVE32-WWM-PREALLOC-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; WAVE32-WWM-PREALLOC-NEXT: s_waitcnt_vscnt null, 0x0
; WAVE32-WWM-PREALLOC-NEXT: s_mov_b64 s[0:1], s[20:21]
; WAVE32-WWM-PREALLOC-NEXT: s_mov_b64 s[2:3], s[22:23]
@@ -1350,6 +1353,7 @@ define void @func_stacksave_stackrestore_call_with_stack_objects() {
; WAVE32-O0-NEXT: v_writelane_b32 v33, s16, 1
; WAVE32-O0-NEXT: v_mov_b32_e32 v0, 42
; WAVE32-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33
+; WAVE32-O0-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; WAVE32-O0-NEXT: s_waitcnt_vscnt null, 0x0
; WAVE32-O0-NEXT: s_mov_b64 s[22:23], s[2:3]
; WAVE32-O0-NEXT: s_mov_b64 s[20:21], s[0:1]
@@ -1461,6 +1465,7 @@ define void @func_stacksave_stackrestore_call_with_stack_objects() {
; WAVE64-O0-NEXT: v_writelane_b32 v33, s16, 1
; WAVE64-O0-NEXT: v_mov_b32_e32 v0, 42
; WAVE64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33
+; WAVE64-O0-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; WAVE64-O0-NEXT: s_waitcnt_vscnt null, 0x0
; WAVE64-O0-NEXT: s_mov_b64 s[22:23], s[2:3]
; WAVE64-O0-NEXT: s_mov_b64 s[20:21], s[0:1]
@@ -1572,6 +1577,7 @@ define void @func_stacksave_stackrestore_call_with_stack_objects() {
; WAVE32-WWM-PREALLOC-NEXT: v_writelane_b32 v32, s16, 1
; WAVE32-WWM-PREALLOC-NEXT: v_mov_b32_e32 v0, 42
; WAVE32-WWM-PREALLOC-NEXT: buffer_store_dword v0, off, s[0:3], s33
+; WAVE32-WWM-PREALLOC-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; WAVE32-WWM-PREALLOC-NEXT: s_waitcnt_vscnt null, 0x0
; WAVE32-WWM-PREALLOC-NEXT: s_mov_b64 s[22:23], s[2:3]
; WAVE32-WWM-PREALLOC-NEXT: s_mov_b64 s[20:21], s[0:1]
diff --git a/llvm/test/CodeGen/AMDGPU/trap-abis.ll b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
index 69cc63eba6243..4c4cf4a273e7f 100644
--- a/llvm/test/CodeGen/AMDGPU/trap-abis.ll
+++ b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
@@ -83,6 +83,7 @@ define amdgpu_kernel void @trap(ptr addrspace(1) nocapture readonly %arg0) {
; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v1, 1
; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt lgkmcnt(0)
; HSA-TRAP-GFX1100-O0-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt_vscnt null, 0x0
; HSA-TRAP-GFX1100-O0-NEXT: s_trap 2
; HSA-TRAP-GFX1100-O0-NEXT: s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
@@ -248,6 +249,7 @@ define amdgpu_kernel void @non_entry_trap(ptr addrspace(1) nocapture readonly %a
; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v0, 0
; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v1, 3
; HSA-TRAP-GFX1100-O0-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt_vscnt null, 0x0
; HSA-TRAP-GFX1100-O0-NEXT: s_endpgm
; HSA-TRAP-GFX1100-O0-NEXT: .LBB1_3: ; =>This Inner Loop Header: Depth=1
@@ -382,6 +384,7 @@ define amdgpu_kernel void @trap_with_use_after(ptr addrspace(1) %arg0, ptr addrs
; HSA-TRAP-GFX1100-O0-NEXT: scratch_load_b32 v1, off, off offset:4 ; 4-byte Folded Reload
; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(0)
; HSA-TRAP-GFX1100-O0-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt_vscnt null, 0x0
; HSA-TRAP-GFX1100-O0-NEXT: s_endpgm
; HSA-TRAP-GFX1100-O0-NEXT: .LBB2_2:
@@ -482,10 +485,12 @@ define amdgpu_kernel void @debugtrap(ptr addrspace(1) nocapture readonly %arg0)
; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v1, 1
; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt lgkmcnt(0)
; HSA-TRAP-GFX1100-O0-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt_vscnt null, 0x0
; HSA-TRAP-GFX1100-O0-NEXT: s_trap 3
; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v1, 2
; HSA-TRAP-GFX1100-O0-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(63) expcnt(7) lgkmcnt(63)
; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt_vscnt null, 0x0
; HSA-TRAP-GFX1100-O0-NEXT: s_endpgm
store volatile i32 1, ptr addrspace(1) %arg0
diff --git a/llvm/test/CodeGen/AMDGPU/wait-before-stores-with-scope_sys.mir b/llvm/test/CodeGen/AMDGPU/wait-before-stores-with-scope_sys.mir
index acf8bd3a6ab56..8795f28cd4420 100644
--- a/llvm/test/CodeGen/AMDGPU/wait-before-stores-with-scope_sys.mir
+++ b/llvm/test/CodeGen/AMDGPU/wait-before-stores-with-scope_sys.mir
@@ -36,6 +36,7 @@ body: |
; GFX12-NEXT: S_WAIT_KMCNT_soft 0
; GFX12-NEXT: S_WAIT_STORECNT_soft 0
; GFX12-NEXT: GLOBAL_STORE_DWORD killed renamable $vgpr1_vgpr2, killed renamable $vgpr0, 0, 24, implicit $exec :: (volatile store (s32), addrspace 1)
+ ; GFX12-NEXT: S_WAIT_LOADCNT_soft 63
; GFX12-NEXT: S_WAIT_STORECNT_soft 0
; GFX12-NEXT: S_ENDPGM 0
GLOBAL_STORE_DWORD killed renamable $vgpr1_vgpr2, killed renamable $vgpr0, 0, 0, implicit $exec :: (volatile store (s32), addrspace 1)
More information about the llvm-branch-commits
mailing list