[llvm] [AMDGPU] Make globally-addressable-scratch opt-in (PR #189555)
Pierre van Houtryve via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 31 01:02:44 PDT 2026
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/189555
This feature is meant to be opt-in for more advanced users, not default-enabled.
It may reduce performance otherwise as we can't assume private AS is thread-local
when it is enabled.
- Add `HasGloballyAddressableScratchSupport` feature to check if a target's scratch
addressing is changed due to support for globally addressable scratch.
- Use `EnableGloballyAddressableScratch` to check whether the user opted into
globally addressable scratch. This affects whether to lower scratch atomics as flat,
and in the future will affect whether NV=1 can be set on scratch accesses.
>From a9da9150886f4cab0723b743caa0cb6fa845a426 Mon Sep 17 00:00:00 2001
From: pvanhout <pierre.vanhoutryve at amd.com>
Date: Mon, 30 Mar 2026 11:55:36 +0200
Subject: [PATCH] [AMDGPU] Make globally-addressable-scratch opt-in
This feature is meant to be opt-in for more advanced users, not default-enabled.
It may reduce performance otherwise as we can't assume private AS is thread-local
when it is enabled.
- Add `HasGloballyAddressableScratchSupport` feature to check if a target's scratch
addressing is changed due to support for globally addressable scratch.
- Use `EnableGloballyAddressableScratch` to check whether the user opted into
globally addressable scratch. This affects whether to lower scratch atomics as flat,
and in the future will affect whether NV=1 can be set on scratch accesses.
---
llvm/docs/AMDGPUUsage.rst | 144 +-
llvm/lib/Target/AMDGPU/AMDGPU.td | 17 +-
.../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 8 +-
.../AMDGPU/AMDGPUTargetTransformInfo.cpp | 4 +-
.../AMDGPU/AsmParser/AMDGPUAsmParser.cpp | 6 +-
llvm/lib/Target/AMDGPU/GCNSubtarget.h | 5 +
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 14 +-
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 2 +-
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | 2 +-
.../AMDGPU/memory-legalizer-private-agent.ll | 8428 ++++++++++-------
.../memory-legalizer-private-cluster.ll | 7986 +++++++++-------
.../memory-legalizer-private-lastuse.ll | 6 +-
.../memory-legalizer-private-nontemporal.ll | 6 +-
.../memory-legalizer-private-singlethread.ll | 7605 ++++++++-------
.../AMDGPU/memory-legalizer-private-system.ll | 8020 +++++++++-------
.../memory-legalizer-private-volatile.ll | 6 +-
.../memory-legalizer-private-wavefront.ll | 7605 ++++++++-------
.../memory-legalizer-private-workgroup.ll | 7909 +++++++++-------
.../AMDGPU/expand-atomic-private-gas.ll | 221 +-
19 files changed, 27657 insertions(+), 20337 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 1ede5ca2d4cf6..388e1b33fe12d 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -541,27 +541,21 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
work-item
IDs
- ``gfx1250`` ``amdgcn`` APU - Architected *TBA*
- flat
- scratch .. TODO::
+ ``gfx1250`` ``amdgcn`` APU - globally- - Architected *TBA*
+ addressable- flat
+ scratch scratch .. TODO::
- Packed
work-item Add product
IDs names.
- - Globally
- Accessible
- Scratch
- Workgroup
Clusters
- ``gfx1251`` ``amdgcn`` APU - Architected *TBA*
- flat
- scratch .. TODO::
+ ``gfx1251`` ``amdgcn`` APU - globally- - Architected *TBA*
+ addressable- flat
+ scratch scratch .. TODO::
- Packed
work-item Add product
IDs names.
- - Globally
- Accessible
- Scratch
- Workgroup
Clusters
@@ -753,64 +747,70 @@ For example:
.. table:: AMDGPU Target Features
:name: amdgpu-target-features-table
- =============== ============================ ==================================================
- Target Feature Clang Option to Control Description
+ ============================= ============================ ==================================================
+ Target Feature Clang Option to Control Description
Name
- =============== ============================ ==================================================
- cumode - ``-m[no-]cumode`` Control the wavefront execution mode used
- when generating code for kernels. When disabled
- native WGP wavefront execution mode is used,
- when enabled CU wavefront execution mode is used
- (see :ref:`amdgpu-amdhsa-memory-model`).
-
- sramecc - ``-mcpu`` If specified, generate code that can only be
- - ``--offload-arch`` loaded and executed in a process that has a
- matching setting for SRAMECC.
-
- If not specified for code object V2 to V3, generate
- code that can be loaded and executed in a process
- with SRAMECC enabled.
-
- If not specified for code object V4 or above, generate
- code that can be loaded and executed in a process
- with either setting of SRAMECC.
-
- tgsplit ``-m[no-]tgsplit`` Enable/disable generating code that assumes
- work-groups are launched in threadgroup split mode.
- When enabled the waves of a work-group may be
- launched in different CUs.
-
- wavefrontsize64 - ``-m[no-]wavefrontsize64`` Control the wavefront size used when
- generating code for kernels. When disabled
- native wavefront size 32 is used, when enabled
- wavefront size 64 is used.
-
- xnack - ``-mcpu`` If specified, generate code that can only be
- - ``--offload-arch`` loaded and executed in a process that has a
- matching setting for XNACK replay.
-
- If not specified for code object V2 to V3, generate
- code that can be loaded and executed in a process
- with XNACK replay enabled.
-
- If not specified for code object V4 or above, generate
- code that can be loaded and executed in a process
- with either setting of XNACK replay.
-
- XNACK replay can be used for demand paging and
- page migration. If enabled in the device, then if
- a page fault occurs the code may execute
- incorrectly unless generated with XNACK replay
- enabled, or generated for code object V4 or above without
- specifying XNACK replay. Executing code that was
- generated with XNACK replay enabled, or generated
- for code object V4 or above without specifying XNACK replay,
- on a device that does not have XNACK replay
- enabled will execute correctly but may be less
- performant than code generated for XNACK replay
- disabled.
-
- =============== ============================ ==================================================
+ ============================= ============================ ==================================================
+ cumode - ``-m[no-]cumode`` Control the wavefront execution mode used
+ when generating code for kernels. When disabled
+ native WGP wavefront execution mode is used,
+ when enabled CU wavefront execution mode is used
+ (see :ref:`amdgpu-amdhsa-memory-model`).
+
+ sramecc - ``-mcpu`` If specified, generate code that can only be
+ - ``--offload-arch`` loaded and executed in a process that has a
+ matching setting for SRAMECC.
+
+ If not specified for code object V2 to V3, generate
+ code that can be loaded and executed in a process
+ with SRAMECC enabled.
+
+ If not specified for code object V4 or above, generate
+ code that can be loaded and executed in a process
+ with either setting of SRAMECC.
+
+ tgsplit ``-m[no-]tgsplit`` Enable/disable generating code that assumes
+ work-groups are launched in threadgroup split mode.
+ When enabled the waves of a work-group may be
+ launched in different CUs.
+
+ wavefrontsize64 - ``-m[no-]wavefrontsize64`` Control the wavefront size used when
+ generating code for kernels. When disabled
+ native wavefront size 32 is used, when enabled
+ wavefront size 64 is used.
+
+ xnack - ``-mcpu`` If specified, generate code that can only be
+ - ``--offload-arch`` loaded and executed in a process that has a
+ matching setting for XNACK replay.
+
+ If not specified for code object V2 to V3, generate
+ code that can be loaded and executed in a process
+ with XNACK replay enabled.
+
+ If not specified for code object V4 or above, generate
+ code that can be loaded and executed in a process
+ with either setting of XNACK replay.
+
+ XNACK replay can be used for demand paging and
+ page migration. If enabled in the device, then if
+ a page fault occurs the code may execute
+ incorrectly unless generated with XNACK replay
+ enabled, or generated for code object V4 or above without
+ specifying XNACK replay. Executing code that was
+ generated with XNACK replay enabled, or generated
+ for code object V4 or above without specifying XNACK replay,
+ on a device that does not have XNACK replay
+ enabled will execute correctly but may be less
+ performant than code generated for XNACK replay
+ disabled.
+
+ globally-addressable-scratch - ``--offload-arch`` When enabled, scratch (private) memory can be shared
+ between threads without triggering undefined behavior.
+ Disabled by default as this may incur a performance penalty
+ because the compiler can no longer assume private memory is
+ thread-local when this is enabled.
+
+ ============================= ============================ ==================================================
.. _amdgpu-target-id:
@@ -1009,7 +1009,7 @@ supported for the ``amdgcn`` target.
access is not supported except by flat and scratch instructions in
GFX9-GFX11.
- On targets without "Globally Accessible Scratch" (introduced in GFX125x), code that
+ On targets without ``globally-addressable-scratch``, or if the feature is disabled, code that
manipulates the stack values in other lanes of a wavefront, such as by
``addrspacecast``-ing stack pointers to generic ones and taking offsets that reach other
lanes or by explicitly constructing the scratch buffer descriptor, triggers undefined
@@ -17350,8 +17350,8 @@ For GFX125x:
This section is currently incomplete as work on the compiler is still ongoing.
The following is a non-exhaustive list of unimplemented/undocumented features:
- non-volatile bit code sequences, globally accessing scratch atomics,
- multicast loads, barriers (including split barriers) and cooperative atomics.
+ non-volatile bit code sequences, multicast loads, barriers (including split barriers)
+ and cooperative atomics.
Scalar operations memory model needs more elaboration as well.
* Vector memory operations are performed as wavefront wide operations, with the
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 4455f686205a6..9d191531fa9e6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1292,9 +1292,16 @@ defm XF32Insts : AMDGPUSubtargetFeature<"xf32-insts",
"v_mfma_f32_16x16x8_xf32 and v_mfma_f32_32x32x4_xf32"
>;
-defm GloballyAddressableScratch : AMDGPUSubtargetFeature<"globally-addressable-scratch",
- "FLAT instructions can access scratch memory for any thread in any wave",
- /*GenPredicate=*/0
+def FeatureGloballyAddressableScratchSupport : SubtargetFeature<"globally-addressable-scratch-support",
+ "HasGloballyAddressableScratchSupport",
+ "true",
+ "Hardware supports globally-addressable-scratch"
+>;
+
+def FeatureGloballyAddressableScratch : SubtargetFeature<"globally-addressable-scratch",
+ "EnableGloballyAddressableScratch",
+ "true",
+ "FLAT instructions can access scratch memory from any thread in any wave"
>;
// Enable the use of SCRATCH_STORE/LOAD_BLOCK instructions for saving and
@@ -2088,7 +2095,7 @@ def FeatureISAVersion12_50_Common : FeatureSet<
FeatureFlatBufferGlobalAtomicFaddF64Inst,
FeatureMemoryAtomicFAddF32DenormalSupport,
FeatureEmulatedSystemScopeAtomics,
- FeatureGloballyAddressableScratch,
+ FeatureGloballyAddressableScratchSupport,
FeatureKernargPreload,
FeatureVmemPrefInsts,
FeatureLshlAddU64Inst,
@@ -2190,7 +2197,7 @@ def FeatureISAVersion13 : FeatureSet<
FeatureAtomicFMinFMaxF64GlobalInsts,
FeatureAtomicFMinFMaxF64FlatInsts,
FeatureFmaMixBF16Insts,
- FeatureGloballyAddressableScratch,
+ FeatureGloballyAddressableScratchSupport,
FeatureCvtPkF16F32Inst,
FeatureF16BF16ToFP6BF6ConversionScaleInsts,
FeatureIEEEMinimumMaximumInsts,
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index c78ef16b00983..4025c9d92bbad 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2356,7 +2356,7 @@ Register AMDGPULegalizerInfo::getSegmentAperture(
? AMDGPU::SRC_SHARED_BASE
: AMDGPU::SRC_PRIVATE_BASE;
assert((ApertureRegNo != AMDGPU::SRC_PRIVATE_BASE ||
- !ST.hasGloballyAddressableScratch()) &&
+ !ST.hasGloballyAddressableScratchSupport()) &&
"Cannot use src_private_base with globally addressable scratch!");
Register Dst = MRI.createGenericVirtualRegister(S64);
MRI.setRegClass(Dst, &AMDGPU::SReg_64RegClass);
@@ -2481,7 +2481,7 @@ bool AMDGPULegalizerInfo::legalizeAddrSpaceCast(
DestAS == AMDGPUAS::PRIVATE_ADDRESS)) {
auto castFlatToLocalOrPrivate = [&](const DstOp &Dst) -> Register {
if (DestAS == AMDGPUAS::PRIVATE_ADDRESS &&
- ST.hasGloballyAddressableScratch()) {
+ ST.hasGloballyAddressableScratchSupport()) {
// flat -> private with globally addressable scratch: subtract
// src_flat_scratch_base_lo.
const LLT S32 = LLT::scalar(32);
@@ -2532,7 +2532,7 @@ bool AMDGPULegalizerInfo::legalizeAddrSpaceCast(
Register SrcAsInt = B.buildPtrToInt(S32, Src).getReg(0);
if (SrcAS == AMDGPUAS::PRIVATE_ADDRESS &&
- ST.hasGloballyAddressableScratch()) {
+ ST.hasGloballyAddressableScratchSupport()) {
// For wave32: Addr = (TID[4:0] << 52) + FLAT_SCRATCH_BASE + privateAddr
// For wave64: Addr = (TID[5:0] << 51) + FLAT_SCRATCH_BASE + privateAddr
Register AllOnes = B.buildConstant(S32, -1).getReg(0);
@@ -6370,7 +6370,7 @@ bool AMDGPULegalizerInfo::legalizeIsAddrSpace(MachineInstr &MI,
Register Hi32 = Unmerge.getReg(1);
if (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS &&
- ST.hasGloballyAddressableScratch()) {
+ ST.hasGloballyAddressableScratchSupport()) {
Register FlatScratchBaseHi =
B.buildInstr(AMDGPU::S_MOV_B32, {S32},
{Register(AMDGPU::SRC_FLAT_SCRATCH_BASE_HI)})
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index d02bc45bc14f6..6750be3031da9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -1061,7 +1061,7 @@ bool GCNTTIImpl::isSourceOfDivergence(const Value *V) const {
unsigned DstAS = Intrinsic->getType()->getPointerAddressSpace();
return SrcAS == AMDGPUAS::PRIVATE_ADDRESS &&
DstAS == AMDGPUAS::FLAT_ADDRESS &&
- ST->hasGloballyAddressableScratch();
+ ST->hasGloballyAddressableScratchSupport();
}
case Intrinsic::amdgcn_workitem_id_y:
case Intrinsic::amdgcn_workitem_id_z: {
@@ -1094,7 +1094,7 @@ bool GCNTTIImpl::isSourceOfDivergence(const Value *V) const {
if (auto *CastI = dyn_cast<AddrSpaceCastInst>(V)) {
return CastI->getSrcAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS &&
CastI->getDestAddressSpace() == AMDGPUAS::FLAT_ADDRESS &&
- ST->hasGloballyAddressableScratch();
+ ST->hasGloballyAddressableScratchSupport();
}
return false;
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 2236a98c58330..06311cad96efa 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -1612,8 +1612,8 @@ class AMDGPUAsmParser : public MCTargetAsmParser {
return getFeatureBits()[AMDGPU::FeaturePartialNSAEncoding];
}
- bool hasGloballyAddressableScratch() const {
- return getFeatureBits()[AMDGPU::FeatureGloballyAddressableScratch];
+ bool hasGloballyAddressableScratchSupport() const {
+ return getFeatureBits()[AMDGPU::FeatureGloballyAddressableScratchSupport];
}
unsigned getNSAMaxSize(bool HasSampler = false) const {
@@ -6814,7 +6814,7 @@ bool AMDGPUAsmParser::subtargetHasRegister(const MCRegisterInfo &MRI,
return isGFX9Plus();
case SRC_FLAT_SCRATCH_BASE_LO:
case SRC_FLAT_SCRATCH_BASE_HI:
- return hasGloballyAddressableScratch();
+ return hasGloballyAddressableScratchSupport();
case SRC_POPS_EXITING_WAVE_ID:
return isGFX9Plus() && !isGFX11Plus();
case TBA:
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index fe5fbbe04fb67..e6fbee1fe3f8d 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -1008,6 +1008,11 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
bool requiresWaitOnWorkgroupReleaseFence() const {
return getGeneration() >= GFX10 || isTgSplitEnabled();
}
+
+ bool isGloballyAddressableScratchEnabled() const {
+ return HasGloballyAddressableScratchSupport &&
+ EnableGloballyAddressableScratch;
+ }
};
class GCNUserSGPRUsageInfo {
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 0bc509c4a6b29..94aa0d7200eda 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -2276,7 +2276,7 @@ bool SITargetLowering::isFreeAddrSpaceCast(unsigned SrcAS,
unsigned DestAS) const {
if (SrcAS == AMDGPUAS::FLAT_ADDRESS) {
if (DestAS == AMDGPUAS::PRIVATE_ADDRESS &&
- Subtarget->hasGloballyAddressableScratch()) {
+ Subtarget->hasGloballyAddressableScratchSupport()) {
// Flat -> private requires subtracting src_flat_scratch_base_lo.
return false;
}
@@ -8791,7 +8791,7 @@ SDValue SITargetLowering::getSegmentAperture(unsigned AS, const SDLoc &DL,
? AMDGPU::SRC_SHARED_BASE
: AMDGPU::SRC_PRIVATE_BASE;
assert((ApertureRegNo != AMDGPU::SRC_PRIVATE_BASE ||
- !Subtarget->hasGloballyAddressableScratch()) &&
+ !Subtarget->hasGloballyAddressableScratchSupport()) &&
"Cannot use src_private_base with globally addressable scratch!");
// Note: this feature (register) is broken. When used as a 32-bit operand,
// it returns a wrong value (all zeroes?). The real value is in the upper 32
@@ -8893,7 +8893,7 @@ SDValue SITargetLowering::lowerADDRSPACECAST(SDValue Op,
SDValue Ptr = DAG.getNode(ISD::TRUNCATE, SL, MVT::i32, Src);
if (DestAS == AMDGPUAS::PRIVATE_ADDRESS &&
- Subtarget->hasGloballyAddressableScratch()) {
+ Subtarget->hasGloballyAddressableScratchSupport()) {
// flat -> private with globally addressable scratch: subtract
// src_flat_scratch_base_lo.
SDValue FlatScratchBaseLo(
@@ -8922,7 +8922,7 @@ SDValue SITargetLowering::lowerADDRSPACECAST(SDValue Op,
SrcAS == AMDGPUAS::PRIVATE_ADDRESS) {
SDValue CvtPtr;
if (SrcAS == AMDGPUAS::PRIVATE_ADDRESS &&
- Subtarget->hasGloballyAddressableScratch()) {
+ Subtarget->hasGloballyAddressableScratchSupport()) {
// For wave32: Addr = (TID[4:0] << 52) + FLAT_SCRATCH_BASE + privateAddr
// For wave64: Addr = (TID[5:0] << 51) + FLAT_SCRATCH_BASE + privateAddr
SDValue AllOnes = DAG.getSignedTargetConstant(-1, SL, MVT::i32);
@@ -10715,7 +10715,7 @@ SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
? AMDGPUAS::LOCAL_ADDRESS
: AMDGPUAS::PRIVATE_ADDRESS;
if (AS == AMDGPUAS::PRIVATE_ADDRESS &&
- Subtarget->hasGloballyAddressableScratch()) {
+ Subtarget->hasGloballyAddressableScratchSupport()) {
SDValue FlatScratchBaseHi(
DAG.getMachineNode(
AMDGPU::S_MOV_B32, DL, MVT::i32,
@@ -19436,8 +19436,8 @@ static bool flatInstrMayAccessPrivate(const Instruction *I) {
static TargetLowering::AtomicExpansionKind
getPrivateAtomicExpansionKind(const GCNSubtarget &STI) {
- // For GAS, lower to flat atomic.
- return STI.hasGloballyAddressableScratch()
+ // If GAS is enabled, scratch atomics need to be expanded to FLAT.
+ return STI.isGloballyAddressableScratchEnabled()
? TargetLowering::AtomicExpansionKind::CustomExpand
: TargetLowering::AtomicExpansionKind::NotAtomic;
}
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index df2700d414893..8d858db0c6bfe 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -10740,7 +10740,7 @@ SIInstrInfo::getGenericInstructionUniformity(const MachineInstr &MI) const {
unsigned SrcAS = SrcTy.getAddressSpace();
return SrcAS == AMDGPUAS::PRIVATE_ADDRESS &&
DstAS == AMDGPUAS::FLAT_ADDRESS &&
- ST.hasGloballyAddressableScratch()
+ ST.hasGloballyAddressableScratchSupport()
? InstructionUniformity::NeverUniform
: InstructionUniformity::Default;
};
diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index e5f352a3ed110..a7dcc6d8aa849 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -934,7 +934,7 @@ bool SICacheControl::enableCPolBits(const MachineBasicBlock::iterator MI,
}
bool SICacheControl::canAffectGlobalAddrSpace(SIAtomicAddrSpace AS) const {
- assert((!ST.hasGloballyAddressableScratch() ||
+ assert((!ST.isGloballyAddressableScratchEnabled() ||
(AS & SIAtomicAddrSpace::GLOBAL) != SIAtomicAddrSpace::NONE ||
(AS & SIAtomicAddrSpace::SCRATCH) == SIAtomicAddrSpace::NONE) &&
"scratch instructions should already be replaced by flat "
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
index 220bc97a6822e..776039ee26ff7 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
@@ -12,7 +12,8 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1100 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX11-CU %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 < %s | FileCheck --check-prefixes=GFX12-WGP %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX12-CU %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250,GFX1250-NOGAS %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 -mattr=+globally-addressable-scratch < %s | FileCheck --check-prefixes=GFX1250,GFX1250-GAS %s
define amdgpu_kernel void @private_agent_unordered_load(
; GFX6-LABEL: private_agent_unordered_load:
@@ -177,36 +178,47 @@ define amdgpu_kernel void @private_agent_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("agent") unordered, align 4
@@ -377,36 +389,47 @@ define amdgpu_kernel void @private_agent_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("agent") monotonic, align 4
@@ -577,38 +600,49 @@ define amdgpu_kernel void @private_agent_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("agent") acquire, align 4
@@ -779,40 +813,51 @@ define amdgpu_kernel void @private_agent_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("agent") seq_cst, align 4
@@ -963,36 +1008,46 @@ define amdgpu_kernel void @private_agent_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("agent") unordered, align 4
@@ -1142,36 +1197,46 @@ define amdgpu_kernel void @private_agent_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("agent") monotonic, align 4
@@ -1321,41 +1386,51 @@ define amdgpu_kernel void @private_agent_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("agent") release, align 4
@@ -1505,41 +1580,51 @@ define amdgpu_kernel void @private_agent_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("agent") seq_cst, align 4
@@ -1689,36 +1774,46 @@ define amdgpu_kernel void @private_agent_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent") monotonic
@@ -1868,39 +1963,49 @@ define amdgpu_kernel void @private_agent_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent") acquire
@@ -2050,41 +2155,51 @@ define amdgpu_kernel void @private_agent_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent") release
@@ -2234,44 +2349,54 @@ define amdgpu_kernel void @private_agent_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent") acq_rel
@@ -2421,44 +2546,54 @@ define amdgpu_kernel void @private_agent_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent") seq_cst
@@ -2662,40 +2797,53 @@ define amdgpu_kernel void @private_agent_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent") acquire
@@ -2900,45 +3048,58 @@ define amdgpu_kernel void @private_agent_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent") acq_rel
@@ -3143,45 +3304,58 @@ define amdgpu_kernel void @private_agent_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent") seq_cst
@@ -3416,42 +3590,56 @@ define amdgpu_kernel void @private_agent_monotonic_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3686,45 +3874,59 @@ define amdgpu_kernel void @private_agent_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3959,47 +4161,61 @@ define amdgpu_kernel void @private_agent_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4234,50 +4450,64 @@ define amdgpu_kernel void @private_agent_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4512,50 +4742,64 @@ define amdgpu_kernel void @private_agent_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4790,45 +5034,59 @@ define amdgpu_kernel void @private_agent_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5063,45 +5321,59 @@ define amdgpu_kernel void @private_agent_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5336,50 +5608,64 @@ define amdgpu_kernel void @private_agent_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5614,50 +5900,64 @@ define amdgpu_kernel void @private_agent_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5892,50 +6192,64 @@ define amdgpu_kernel void @private_agent_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6170,50 +6484,64 @@ define amdgpu_kernel void @private_agent_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_monotonic_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_monotonic_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_monotonic_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6448,50 +6776,64 @@ define amdgpu_kernel void @private_agent_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acquire_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acquire_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acquire_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6726,50 +7068,64 @@ define amdgpu_kernel void @private_agent_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_release_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_release_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_release_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7004,50 +7360,64 @@ define amdgpu_kernel void @private_agent_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acq_rel_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acq_rel_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acq_rel_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7282,50 +7652,64 @@ define amdgpu_kernel void @private_agent_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7588,44 +7972,59 @@ define amdgpu_kernel void @private_agent_monotonic_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7890,46 +8289,61 @@ define amdgpu_kernel void @private_agent_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8194,49 +8608,64 @@ define amdgpu_kernel void @private_agent_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_release_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_release_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_release_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8501,51 +8930,66 @@ define amdgpu_kernel void @private_agent_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8810,51 +9254,66 @@ define amdgpu_kernel void @private_agent_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9119,46 +9578,61 @@ define amdgpu_kernel void @private_agent_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9423,46 +9897,61 @@ define amdgpu_kernel void @private_agent_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9727,51 +10216,66 @@ define amdgpu_kernel void @private_agent_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10036,51 +10540,66 @@ define amdgpu_kernel void @private_agent_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10345,51 +10864,66 @@ define amdgpu_kernel void @private_agent_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10654,51 +11188,66 @@ define amdgpu_kernel void @private_agent_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10963,51 +11512,66 @@ define amdgpu_kernel void @private_agent_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11272,51 +11836,66 @@ define amdgpu_kernel void @private_agent_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_release_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_release_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_release_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11581,51 +12160,66 @@ define amdgpu_kernel void @private_agent_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11890,51 +12484,66 @@ define amdgpu_kernel void @private_agent_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -12107,36 +12716,47 @@ define amdgpu_kernel void @private_agent_one_as_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("agent-one-as") unordered, align 4
@@ -12307,36 +12927,47 @@ define amdgpu_kernel void @private_agent_one_as_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("agent-one-as") monotonic, align 4
@@ -12507,38 +13138,49 @@ define amdgpu_kernel void @private_agent_one_as_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("agent-one-as") acquire, align 4
@@ -12709,40 +13351,51 @@ define amdgpu_kernel void @private_agent_one_as_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("agent-one-as") seq_cst, align 4
@@ -12893,36 +13546,46 @@ define amdgpu_kernel void @private_agent_one_as_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("agent-one-as") unordered, align 4
@@ -13072,36 +13735,46 @@ define amdgpu_kernel void @private_agent_one_as_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("agent-one-as") monotonic, align 4
@@ -13251,41 +13924,51 @@ define amdgpu_kernel void @private_agent_one_as_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("agent-one-as") release, align 4
@@ -13435,41 +14118,51 @@ define amdgpu_kernel void @private_agent_one_as_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("agent-one-as") seq_cst, align 4
@@ -13619,36 +14312,46 @@ define amdgpu_kernel void @private_agent_one_as_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent-one-as") monotonic
@@ -13798,39 +14501,49 @@ define amdgpu_kernel void @private_agent_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent-one-as") acquire
@@ -13980,41 +14693,51 @@ define amdgpu_kernel void @private_agent_one_as_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent-one-as") release
@@ -14164,44 +14887,54 @@ define amdgpu_kernel void @private_agent_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent-one-as") acq_rel
@@ -14351,44 +15084,54 @@ define amdgpu_kernel void @private_agent_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent-one-as") seq_cst
@@ -14592,40 +15335,53 @@ define amdgpu_kernel void @private_agent_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent-one-as") acquire
@@ -14830,45 +15586,58 @@ define amdgpu_kernel void @private_agent_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent-one-as") acq_rel
@@ -15073,45 +15842,58 @@ define amdgpu_kernel void @private_agent_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("agent-one-as") seq_cst
@@ -15346,42 +16128,56 @@ define amdgpu_kernel void @private_agent_one_as_monotonic_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15616,45 +16412,59 @@ define amdgpu_kernel void @private_agent_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15889,47 +16699,61 @@ define amdgpu_kernel void @private_agent_one_as_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16164,50 +16988,64 @@ define amdgpu_kernel void @private_agent_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16442,50 +17280,64 @@ define amdgpu_kernel void @private_agent_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16720,45 +17572,59 @@ define amdgpu_kernel void @private_agent_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16993,45 +17859,59 @@ define amdgpu_kernel void @private_agent_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17266,50 +18146,64 @@ define amdgpu_kernel void @private_agent_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17544,50 +18438,64 @@ define amdgpu_kernel void @private_agent_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17822,50 +18730,64 @@ define amdgpu_kernel void @private_agent_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18100,50 +19022,64 @@ define amdgpu_kernel void @private_agent_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_monotonic_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18378,50 +19314,64 @@ define amdgpu_kernel void @private_agent_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acquire_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18656,50 +19606,64 @@ define amdgpu_kernel void @private_agent_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_release_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_release_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_release_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18934,50 +19898,64 @@ define amdgpu_kernel void @private_agent_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acq_rel_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19212,50 +20190,64 @@ define amdgpu_kernel void @private_agent_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19518,44 +20510,59 @@ define amdgpu_kernel void @private_agent_one_as_monotonic_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19820,46 +20827,61 @@ define amdgpu_kernel void @private_agent_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20124,51 +21146,66 @@ define amdgpu_kernel void @private_agent_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20433,51 +21470,66 @@ define amdgpu_kernel void @private_agent_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20742,46 +21794,61 @@ define amdgpu_kernel void @private_agent_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21046,46 +22113,61 @@ define amdgpu_kernel void @private_agent_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21350,51 +22432,66 @@ define amdgpu_kernel void @private_agent_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21659,51 +22756,66 @@ define amdgpu_kernel void @private_agent_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21968,51 +23080,66 @@ define amdgpu_kernel void @private_agent_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22277,51 +23404,66 @@ define amdgpu_kernel void @private_agent_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22586,51 +23728,66 @@ define amdgpu_kernel void @private_agent_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22895,51 +24052,66 @@ define amdgpu_kernel void @private_agent_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_release_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23204,51 +24376,66 @@ define amdgpu_kernel void @private_agent_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23513,51 +24700,66 @@ define amdgpu_kernel void @private_agent_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_agent_one_as_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_DEV
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_agent_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_agent_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_DEV
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23566,3 +24768,5 @@ entry:
store i32 %val0, ptr addrspace(5) %out, align 4
ret void
}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX1250: {{.*}}
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll
index 33dbd9e50b52b..ec4482b56a4a3 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll
@@ -12,7 +12,8 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1100 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX11-CU %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 < %s | FileCheck --check-prefixes=GFX12-WGP %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX12-CU %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250,GFX1250-NOGAS %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 -mattr=+globally-addressable-scratch < %s | FileCheck --check-prefixes=GFX1250,GFX1250-GAS %s
define amdgpu_kernel void @private_cluster_unordered_load(
; GFX6-LABEL: private_cluster_unordered_load:
@@ -177,36 +178,47 @@ define amdgpu_kernel void @private_cluster_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("cluster") unordered, align 4
@@ -377,36 +389,47 @@ define amdgpu_kernel void @private_cluster_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("cluster") monotonic, align 4
@@ -577,37 +600,48 @@ define amdgpu_kernel void @private_cluster_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("cluster") acquire, align 4
@@ -778,39 +812,50 @@ define amdgpu_kernel void @private_cluster_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("cluster") seq_cst, align 4
@@ -961,36 +1006,46 @@ define amdgpu_kernel void @private_cluster_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("cluster") unordered, align 4
@@ -1140,36 +1195,46 @@ define amdgpu_kernel void @private_cluster_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("cluster") monotonic, align 4
@@ -1319,38 +1384,48 @@ define amdgpu_kernel void @private_cluster_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("cluster") release, align 4
@@ -1500,38 +1575,48 @@ define amdgpu_kernel void @private_cluster_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("cluster") seq_cst, align 4
@@ -1681,36 +1766,46 @@ define amdgpu_kernel void @private_cluster_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster") monotonic
@@ -1860,38 +1955,48 @@ define amdgpu_kernel void @private_cluster_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster") acquire
@@ -2041,38 +2146,48 @@ define amdgpu_kernel void @private_cluster_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster") release
@@ -2222,40 +2337,50 @@ define amdgpu_kernel void @private_cluster_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster") acq_rel
@@ -2405,40 +2530,50 @@ define amdgpu_kernel void @private_cluster_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster") seq_cst
@@ -2642,39 +2777,52 @@ define amdgpu_kernel void @private_cluster_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster") acquire
@@ -2879,41 +3027,54 @@ define amdgpu_kernel void @private_cluster_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster") acq_rel
@@ -3118,41 +3279,54 @@ define amdgpu_kernel void @private_cluster_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster") seq_cst
@@ -3387,42 +3561,56 @@ define amdgpu_kernel void @private_cluster_monotonic_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3657,44 +3845,58 @@ define amdgpu_kernel void @private_cluster_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3929,44 +4131,58 @@ define amdgpu_kernel void @private_cluster_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4201,46 +4417,60 @@ define amdgpu_kernel void @private_cluster_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4475,46 +4705,60 @@ define amdgpu_kernel void @private_cluster_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4749,44 +4993,58 @@ define amdgpu_kernel void @private_cluster_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5021,44 +5279,58 @@ define amdgpu_kernel void @private_cluster_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5293,46 +5565,60 @@ define amdgpu_kernel void @private_cluster_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5567,46 +5853,60 @@ define amdgpu_kernel void @private_cluster_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5841,46 +6141,60 @@ define amdgpu_kernel void @private_cluster_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6115,46 +6429,60 @@ define amdgpu_kernel void @private_cluster_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_monotonic_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_monotonic_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_monotonic_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6389,46 +6717,60 @@ define amdgpu_kernel void @private_cluster_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acquire_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acquire_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acquire_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6663,46 +7005,60 @@ define amdgpu_kernel void @private_cluster_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_release_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_release_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_release_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6937,46 +7293,60 @@ define amdgpu_kernel void @private_cluster_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acq_rel_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acq_rel_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acq_rel_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7211,46 +7581,60 @@ define amdgpu_kernel void @private_cluster_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7513,44 +7897,59 @@ define amdgpu_kernel void @private_cluster_monotonic_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7815,45 +8214,60 @@ define amdgpu_kernel void @private_cluster_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8118,46 +8532,61 @@ define amdgpu_kernel void @private_cluster_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_release_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_release_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_release_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8422,47 +8851,62 @@ define amdgpu_kernel void @private_cluster_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8727,47 +9171,62 @@ define amdgpu_kernel void @private_cluster_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9032,45 +9491,60 @@ define amdgpu_kernel void @private_cluster_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9335,45 +9809,60 @@ define amdgpu_kernel void @private_cluster_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9638,47 +10127,62 @@ define amdgpu_kernel void @private_cluster_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9943,47 +10447,62 @@ define amdgpu_kernel void @private_cluster_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10248,47 +10767,62 @@ define amdgpu_kernel void @private_cluster_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10553,47 +11087,62 @@ define amdgpu_kernel void @private_cluster_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10858,47 +11407,62 @@ define amdgpu_kernel void @private_cluster_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11163,47 +11727,62 @@ define amdgpu_kernel void @private_cluster_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_release_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_release_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_release_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11468,47 +12047,62 @@ define amdgpu_kernel void @private_cluster_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11773,47 +12367,62 @@ define amdgpu_kernel void @private_cluster_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11986,36 +12595,47 @@ define amdgpu_kernel void @private_cluster_one_as_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("cluster-one-as") unordered, align 4
@@ -12186,36 +12806,47 @@ define amdgpu_kernel void @private_cluster_one_as_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("cluster-one-as") monotonic, align 4
@@ -12386,38 +13017,49 @@ define amdgpu_kernel void @private_cluster_one_as_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("cluster-one-as") acquire, align 4
@@ -12588,40 +13230,51 @@ define amdgpu_kernel void @private_cluster_one_as_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("cluster-one-as") seq_cst, align 4
@@ -12772,36 +13425,46 @@ define amdgpu_kernel void @private_cluster_one_as_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("cluster-one-as") unordered, align 4
@@ -12951,36 +13614,46 @@ define amdgpu_kernel void @private_cluster_one_as_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("cluster-one-as") monotonic, align 4
@@ -13130,38 +13803,48 @@ define amdgpu_kernel void @private_cluster_one_as_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("cluster-one-as") release, align 4
@@ -13311,38 +13994,48 @@ define amdgpu_kernel void @private_cluster_one_as_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("cluster-one-as") seq_cst, align 4
@@ -13492,36 +14185,46 @@ define amdgpu_kernel void @private_cluster_one_as_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster-one-as") monotonic
@@ -13671,38 +14374,48 @@ define amdgpu_kernel void @private_cluster_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster-one-as") acquire
@@ -13852,38 +14565,48 @@ define amdgpu_kernel void @private_cluster_one_as_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster-one-as") release
@@ -14033,40 +14756,50 @@ define amdgpu_kernel void @private_cluster_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster-one-as") acq_rel
@@ -14216,40 +14949,50 @@ define amdgpu_kernel void @private_cluster_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster-one-as") seq_cst
@@ -14453,40 +15196,53 @@ define amdgpu_kernel void @private_cluster_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster-one-as") acquire
@@ -14691,42 +15447,55 @@ define amdgpu_kernel void @private_cluster_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster-one-as") acq_rel
@@ -14931,42 +15700,55 @@ define amdgpu_kernel void @private_cluster_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("cluster-one-as") seq_cst
@@ -15201,42 +15983,56 @@ define amdgpu_kernel void @private_cluster_one_as_monotonic_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15471,44 +16267,58 @@ define amdgpu_kernel void @private_cluster_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15743,44 +16553,58 @@ define amdgpu_kernel void @private_cluster_one_as_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16015,46 +16839,60 @@ define amdgpu_kernel void @private_cluster_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16289,46 +17127,60 @@ define amdgpu_kernel void @private_cluster_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16563,44 +17415,58 @@ define amdgpu_kernel void @private_cluster_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16835,44 +17701,58 @@ define amdgpu_kernel void @private_cluster_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17107,46 +17987,60 @@ define amdgpu_kernel void @private_cluster_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17381,46 +18275,60 @@ define amdgpu_kernel void @private_cluster_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17655,46 +18563,60 @@ define amdgpu_kernel void @private_cluster_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17929,46 +18851,60 @@ define amdgpu_kernel void @private_cluster_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_monotonic_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18203,46 +19139,60 @@ define amdgpu_kernel void @private_cluster_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acquire_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18477,46 +19427,60 @@ define amdgpu_kernel void @private_cluster_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_release_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_release_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_release_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18751,46 +19715,60 @@ define amdgpu_kernel void @private_cluster_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acq_rel_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19025,46 +20003,60 @@ define amdgpu_kernel void @private_cluster_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19327,44 +20319,59 @@ define amdgpu_kernel void @private_cluster_one_as_monotonic_monotonic_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19629,46 +20636,61 @@ define amdgpu_kernel void @private_cluster_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19933,48 +20955,63 @@ define amdgpu_kernel void @private_cluster_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20239,48 +21276,63 @@ define amdgpu_kernel void @private_cluster_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20545,46 +21597,61 @@ define amdgpu_kernel void @private_cluster_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20849,46 +21916,61 @@ define amdgpu_kernel void @private_cluster_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21153,48 +22235,63 @@ define amdgpu_kernel void @private_cluster_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21459,48 +22556,63 @@ define amdgpu_kernel void @private_cluster_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21765,48 +22877,63 @@ define amdgpu_kernel void @private_cluster_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22071,48 +23198,63 @@ define amdgpu_kernel void @private_cluster_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22377,48 +23519,63 @@ define amdgpu_kernel void @private_cluster_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22683,48 +23840,63 @@ define amdgpu_kernel void @private_cluster_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_release_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22989,48 +24161,63 @@ define amdgpu_kernel void @private_cluster_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23295,48 +24482,63 @@ define amdgpu_kernel void @private_cluster_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_cluster_one_as_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SE
-; GFX1250-NEXT: s_wait_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_cluster_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_cluster_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SE
+; GFX1250-GAS-NEXT: s_wait_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23345,3 +24547,5 @@ entry:
store i32 %val0, ptr addrspace(5) %out, align 4
ret void
}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX1250: {{.*}}
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-lastuse.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-lastuse.ll
index f7bdceb5bd5c3..92b8e66f6f505 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-lastuse.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-lastuse.ll
@@ -1,7 +1,8 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 < %s | FileCheck --check-prefix=GFX12 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefix=GFX12 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250,GFX1250-NOGAS %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 -mattr=+globally-addressable-scratch < %s | FileCheck --check-prefixes=GFX1250,GFX1250-GAS %s
define amdgpu_kernel void @private_last_use_load_0(ptr addrspace(5) %in, ptr addrspace(1) %out) {
; GFX12-LABEL: private_last_use_load_0:
@@ -135,3 +136,6 @@ entry:
!0 = !{i32 1}
declare i32 @llvm.amdgcn.workitem.id.x()
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX1250-GAS: {{.*}}
+; GFX1250-NOGAS: {{.*}}
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-nontemporal.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-nontemporal.ll
index 5e79e414b7414..62e41e8cf4efe 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-nontemporal.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-nontemporal.ll
@@ -12,7 +12,8 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1100 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX11-CU %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 < %s | FileCheck --check-prefixes=GFX12-WGP %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX12-CU %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250,GFX1250-NOGAS %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 -mattr=+globally-addressable-scratch < %s | FileCheck --check-prefixes=GFX1250,GFX1250-GAS %s
define amdgpu_kernel void @private_nontemporal_load_0(
; GFX6-LABEL: private_nontemporal_load_0:
@@ -1100,3 +1101,6 @@ entry:
!0 = !{i32 1}
declare i32 @llvm.amdgcn.workitem.id.x()
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX1250-GAS: {{.*}}
+; GFX1250-NOGAS: {{.*}}
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll
index 929665f377372..70ab5091b5123 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll
@@ -12,7 +12,8 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1100 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX11-CU %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 < %s | FileCheck --check-prefixes=GFX12-WGP %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX12-CU %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250,GFX1250-NOGAS %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 -mattr=+globally-addressable-scratch < %s | FileCheck --check-prefixes=GFX1250,GFX1250-GAS %s
define amdgpu_kernel void @private_singlethread_unordered_load(
; GFX6-LABEL: private_singlethread_unordered_load:
@@ -177,36 +178,47 @@ define amdgpu_kernel void @private_singlethread_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("singlethread") unordered, align 4
@@ -377,36 +389,47 @@ define amdgpu_kernel void @private_singlethread_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("singlethread") monotonic, align 4
@@ -577,36 +600,47 @@ define amdgpu_kernel void @private_singlethread_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("singlethread") acquire, align 4
@@ -777,36 +811,47 @@ define amdgpu_kernel void @private_singlethread_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("singlethread") seq_cst, align 4
@@ -957,36 +1002,46 @@ define amdgpu_kernel void @private_singlethread_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("singlethread") unordered, align 4
@@ -1136,36 +1191,46 @@ define amdgpu_kernel void @private_singlethread_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("singlethread") monotonic, align 4
@@ -1315,36 +1380,46 @@ define amdgpu_kernel void @private_singlethread_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("singlethread") release, align 4
@@ -1494,36 +1569,46 @@ define amdgpu_kernel void @private_singlethread_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("singlethread") seq_cst, align 4
@@ -1673,36 +1758,46 @@ define amdgpu_kernel void @private_singlethread_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread") monotonic
@@ -1852,36 +1947,46 @@ define amdgpu_kernel void @private_singlethread_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread") acquire
@@ -2031,36 +2136,46 @@ define amdgpu_kernel void @private_singlethread_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread") release
@@ -2210,36 +2325,46 @@ define amdgpu_kernel void @private_singlethread_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread") acq_rel
@@ -2389,36 +2514,46 @@ define amdgpu_kernel void @private_singlethread_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread") seq_cst
@@ -2622,38 +2757,51 @@ define amdgpu_kernel void @private_singlethread_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread") acquire
@@ -2858,38 +3006,51 @@ define amdgpu_kernel void @private_singlethread_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread") acq_rel
@@ -3094,38 +3255,51 @@ define amdgpu_kernel void @private_singlethread_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread") seq_cst
@@ -3360,42 +3534,56 @@ define amdgpu_kernel void @private_singlethread_monotonic_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3630,42 +3818,56 @@ define amdgpu_kernel void @private_singlethread_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3900,42 +4102,56 @@ define amdgpu_kernel void @private_singlethread_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4170,42 +4386,56 @@ define amdgpu_kernel void @private_singlethread_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4440,42 +4670,56 @@ define amdgpu_kernel void @private_singlethread_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4710,42 +4954,56 @@ define amdgpu_kernel void @private_singlethread_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4980,42 +5238,56 @@ define amdgpu_kernel void @private_singlethread_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5250,42 +5522,56 @@ define amdgpu_kernel void @private_singlethread_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5520,42 +5806,56 @@ define amdgpu_kernel void @private_singlethread_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5790,42 +6090,56 @@ define amdgpu_kernel void @private_singlethread_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6060,42 +6374,56 @@ define amdgpu_kernel void @private_singlethread_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_monotonic_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_monotonic_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_monotonic_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6330,42 +6658,56 @@ define amdgpu_kernel void @private_singlethread_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acquire_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acquire_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acquire_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6600,42 +6942,56 @@ define amdgpu_kernel void @private_singlethread_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_release_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_release_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_release_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6870,42 +7226,56 @@ define amdgpu_kernel void @private_singlethread_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acq_rel_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acq_rel_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acq_rel_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7140,42 +7510,56 @@ define amdgpu_kernel void @private_singlethread_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7438,44 +7822,59 @@ define amdgpu_kernel void @private_singlethread_monotonic_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7740,44 +8139,59 @@ define amdgpu_kernel void @private_singlethread_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8042,44 +8456,59 @@ define amdgpu_kernel void @private_singlethread_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_release_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_release_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_release_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8344,44 +8773,59 @@ define amdgpu_kernel void @private_singlethread_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8646,44 +9090,59 @@ define amdgpu_kernel void @private_singlethread_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8948,44 +9407,59 @@ define amdgpu_kernel void @private_singlethread_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9250,44 +9724,59 @@ define amdgpu_kernel void @private_singlethread_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9552,44 +10041,59 @@ define amdgpu_kernel void @private_singlethread_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9854,44 +10358,59 @@ define amdgpu_kernel void @private_singlethread_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10156,44 +10675,59 @@ define amdgpu_kernel void @private_singlethread_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10458,44 +10992,59 @@ define amdgpu_kernel void @private_singlethread_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10760,44 +11309,59 @@ define amdgpu_kernel void @private_singlethread_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11062,44 +11626,59 @@ define amdgpu_kernel void @private_singlethread_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_release_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_release_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_release_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11364,44 +11943,59 @@ define amdgpu_kernel void @private_singlethread_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11666,44 +12260,59 @@ define amdgpu_kernel void @private_singlethread_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11876,36 +12485,47 @@ define amdgpu_kernel void @private_singlethread_one_as_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("singlethread-one-as") unordered, align 4
@@ -12076,36 +12696,47 @@ define amdgpu_kernel void @private_singlethread_one_as_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("singlethread-one-as") monotonic, align 4
@@ -12276,36 +12907,47 @@ define amdgpu_kernel void @private_singlethread_one_as_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("singlethread-one-as") acquire, align 4
@@ -12476,36 +13118,47 @@ define amdgpu_kernel void @private_singlethread_one_as_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("singlethread-one-as") seq_cst, align 4
@@ -12656,36 +13309,46 @@ define amdgpu_kernel void @private_singlethread_one_as_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("singlethread-one-as") unordered, align 4
@@ -12835,36 +13498,46 @@ define amdgpu_kernel void @private_singlethread_one_as_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("singlethread-one-as") monotonic, align 4
@@ -13014,36 +13687,46 @@ define amdgpu_kernel void @private_singlethread_one_as_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("singlethread-one-as") release, align 4
@@ -13193,36 +13876,46 @@ define amdgpu_kernel void @private_singlethread_one_as_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("singlethread-one-as") seq_cst, align 4
@@ -13372,36 +14065,46 @@ define amdgpu_kernel void @private_singlethread_one_as_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread-one-as") monotonic
@@ -13551,36 +14254,46 @@ define amdgpu_kernel void @private_singlethread_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread-one-as") acquire
@@ -13730,36 +14443,46 @@ define amdgpu_kernel void @private_singlethread_one_as_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread-one-as") release
@@ -13909,36 +14632,46 @@ define amdgpu_kernel void @private_singlethread_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread-one-as") acq_rel
@@ -14088,36 +14821,46 @@ define amdgpu_kernel void @private_singlethread_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread-one-as") seq_cst
@@ -14321,38 +15064,51 @@ define amdgpu_kernel void @private_singlethread_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread-one-as") acquire
@@ -14557,38 +15313,51 @@ define amdgpu_kernel void @private_singlethread_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread-one-as") acq_rel
@@ -14793,38 +15562,51 @@ define amdgpu_kernel void @private_singlethread_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("singlethread-one-as") seq_cst
@@ -15059,42 +15841,56 @@ define amdgpu_kernel void @private_singlethread_one_as_monotonic_monotonic_cmpxc
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15329,42 +16125,56 @@ define amdgpu_kernel void @private_singlethread_one_as_acquire_monotonic_cmpxchg
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15599,42 +16409,56 @@ define amdgpu_kernel void @private_singlethread_one_as_release_monotonic_cmpxchg
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15869,42 +16693,56 @@ define amdgpu_kernel void @private_singlethread_one_as_acq_rel_monotonic_cmpxchg
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16139,42 +16977,56 @@ define amdgpu_kernel void @private_singlethread_one_as_seq_cst_monotonic_cmpxchg
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16409,42 +17261,56 @@ define amdgpu_kernel void @private_singlethread_one_as_monotonic_acquire_cmpxchg
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16679,42 +17545,56 @@ define amdgpu_kernel void @private_singlethread_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16949,42 +17829,56 @@ define amdgpu_kernel void @private_singlethread_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17219,42 +18113,56 @@ define amdgpu_kernel void @private_singlethread_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17489,42 +18397,56 @@ define amdgpu_kernel void @private_singlethread_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17759,42 +18681,56 @@ define amdgpu_kernel void @private_singlethread_one_as_monotonic_seq_cst_cmpxchg
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_monotonic_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18029,42 +18965,56 @@ define amdgpu_kernel void @private_singlethread_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acquire_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18299,42 +19249,56 @@ define amdgpu_kernel void @private_singlethread_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_release_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_release_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_release_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18569,42 +19533,56 @@ define amdgpu_kernel void @private_singlethread_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18839,42 +19817,56 @@ define amdgpu_kernel void @private_singlethread_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19137,44 +20129,59 @@ define amdgpu_kernel void @private_singlethread_one_as_monotonic_monotonic_ret_c
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19439,44 +20446,59 @@ define amdgpu_kernel void @private_singlethread_one_as_acquire_monotonic_ret_cmp
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19741,44 +20763,59 @@ define amdgpu_kernel void @private_singlethread_one_as_release_monotonic_ret_cmp
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_release_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_release_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_release_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20043,44 +21080,59 @@ define amdgpu_kernel void @private_singlethread_one_as_acq_rel_monotonic_ret_cmp
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20345,44 +21397,59 @@ define amdgpu_kernel void @private_singlethread_one_as_seq_cst_monotonic_ret_cmp
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20647,44 +21714,59 @@ define amdgpu_kernel void @private_singlethread_one_as_monotonic_acquire_ret_cmp
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20949,44 +22031,59 @@ define amdgpu_kernel void @private_singlethread_one_as_acquire_acquire_ret_cmpxc
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21251,44 +22348,59 @@ define amdgpu_kernel void @private_singlethread_one_as_release_acquire_ret_cmpxc
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21553,44 +22665,59 @@ define amdgpu_kernel void @private_singlethread_one_as_acq_rel_acquire_ret_cmpxc
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21855,44 +22982,59 @@ define amdgpu_kernel void @private_singlethread_one_as_seq_cst_acquire_ret_cmpxc
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22157,44 +23299,59 @@ define amdgpu_kernel void @private_singlethread_one_as_monotonic_seq_cst_ret_cmp
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22459,44 +23616,59 @@ define amdgpu_kernel void @private_singlethread_one_as_acquire_seq_cst_ret_cmpxc
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22761,44 +23933,59 @@ define amdgpu_kernel void @private_singlethread_one_as_release_seq_cst_ret_cmpxc
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_release_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23063,44 +24250,59 @@ define amdgpu_kernel void @private_singlethread_one_as_acq_rel_seq_cst_ret_cmpxc
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23365,44 +24567,59 @@ define amdgpu_kernel void @private_singlethread_one_as_seq_cst_seq_cst_ret_cmpxc
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23411,3 +24628,5 @@ entry:
store i32 %val0, ptr addrspace(5) %out, align 4
ret void
}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX1250: {{.*}}
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll
index 5940897e1c801..8771334a2bcf3 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll
@@ -12,7 +12,8 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1100 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX11-CU %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 < %s | FileCheck --check-prefixes=GFX12-WGP %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX12-CU %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250,GFX1250-NOGAS %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 -mattr=+globally-addressable-scratch < %s | FileCheck --check-prefixes=GFX1250,GFX1250-GAS %s
define amdgpu_kernel void @private_system_unordered_load(
; GFX6-LABEL: private_system_unordered_load:
@@ -177,36 +178,47 @@ define amdgpu_kernel void @private_system_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in unordered, align 4
@@ -377,36 +389,47 @@ define amdgpu_kernel void @private_system_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in monotonic, align 4
@@ -577,38 +600,49 @@ define amdgpu_kernel void @private_system_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in acquire, align 4
@@ -779,40 +813,51 @@ define amdgpu_kernel void @private_system_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in seq_cst, align 4
@@ -963,36 +1008,46 @@ define amdgpu_kernel void @private_system_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out unordered, align 4
@@ -1142,36 +1197,46 @@ define amdgpu_kernel void @private_system_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out monotonic, align 4
@@ -1321,41 +1386,51 @@ define amdgpu_kernel void @private_system_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out release, align 4
@@ -1505,41 +1580,51 @@ define amdgpu_kernel void @private_system_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out seq_cst, align 4
@@ -1689,36 +1774,46 @@ define amdgpu_kernel void @private_system_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in monotonic
@@ -1868,39 +1963,49 @@ define amdgpu_kernel void @private_system_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in acquire
@@ -2050,41 +2155,51 @@ define amdgpu_kernel void @private_system_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in release
@@ -2234,44 +2349,54 @@ define amdgpu_kernel void @private_system_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in acq_rel
@@ -2421,44 +2546,54 @@ define amdgpu_kernel void @private_system_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in seq_cst
@@ -2662,40 +2797,53 @@ define amdgpu_kernel void @private_system_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in acquire
@@ -2900,45 +3048,58 @@ define amdgpu_kernel void @private_system_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in acq_rel
@@ -3143,45 +3304,58 @@ define amdgpu_kernel void @private_system_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in seq_cst
@@ -3416,42 +3590,56 @@ define amdgpu_kernel void @private_system_monotonic_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3686,45 +3874,59 @@ define amdgpu_kernel void @private_system_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3959,47 +4161,61 @@ define amdgpu_kernel void @private_system_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4234,50 +4450,64 @@ define amdgpu_kernel void @private_system_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4512,50 +4742,64 @@ define amdgpu_kernel void @private_system_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4790,45 +5034,59 @@ define amdgpu_kernel void @private_system_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5063,45 +5321,59 @@ define amdgpu_kernel void @private_system_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5336,50 +5608,64 @@ define amdgpu_kernel void @private_system_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5614,50 +5900,64 @@ define amdgpu_kernel void @private_system_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5892,50 +6192,64 @@ define amdgpu_kernel void @private_system_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6170,50 +6484,64 @@ define amdgpu_kernel void @private_system_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6476,44 +6804,59 @@ define amdgpu_kernel void @private_system_monotonic_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6778,46 +7121,61 @@ define amdgpu_kernel void @private_system_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7082,51 +7440,66 @@ define amdgpu_kernel void @private_system_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7391,51 +7764,66 @@ define amdgpu_kernel void @private_system_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7700,46 +8088,61 @@ define amdgpu_kernel void @private_system_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8004,46 +8407,61 @@ define amdgpu_kernel void @private_system_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8308,51 +8726,66 @@ define amdgpu_kernel void @private_system_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8617,51 +9050,66 @@ define amdgpu_kernel void @private_system_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8926,51 +9374,66 @@ define amdgpu_kernel void @private_system_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9235,51 +9698,66 @@ define amdgpu_kernel void @private_system_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9544,51 +10022,66 @@ define amdgpu_kernel void @private_system_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9853,51 +10346,66 @@ define amdgpu_kernel void @private_system_relese_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_relese_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_relese_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_relese_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10162,51 +10670,66 @@ define amdgpu_kernel void @private_system_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10471,51 +10994,66 @@ define amdgpu_kernel void @private_system_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10688,36 +11226,47 @@ define amdgpu_kernel void @private_system_one_as_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("one-as") unordered, align 4
@@ -10888,36 +11437,47 @@ define amdgpu_kernel void @private_system_one_as_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("one-as") monotonic, align 4
@@ -11088,38 +11648,49 @@ define amdgpu_kernel void @private_system_one_as_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("one-as") acquire, align 4
@@ -11290,40 +11861,51 @@ define amdgpu_kernel void @private_system_one_as_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("one-as") seq_cst, align 4
@@ -11474,36 +12056,46 @@ define amdgpu_kernel void @private_system_one_as_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("one-as") unordered, align 4
@@ -11653,36 +12245,46 @@ define amdgpu_kernel void @private_system_one_as_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("one-as") monotonic, align 4
@@ -11832,41 +12434,51 @@ define amdgpu_kernel void @private_system_one_as_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("one-as") release, align 4
@@ -12016,41 +12628,51 @@ define amdgpu_kernel void @private_system_one_as_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("one-as") seq_cst, align 4
@@ -12200,36 +12822,46 @@ define amdgpu_kernel void @private_system_one_as_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("one-as") monotonic
@@ -12379,39 +13011,49 @@ define amdgpu_kernel void @private_system_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("one-as") acquire
@@ -12561,41 +13203,51 @@ define amdgpu_kernel void @private_system_one_as_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("one-as") release
@@ -12745,44 +13397,54 @@ define amdgpu_kernel void @private_system_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("one-as") acq_rel
@@ -12932,44 +13594,54 @@ define amdgpu_kernel void @private_system_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2 scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("one-as") seq_cst
@@ -13173,40 +13845,53 @@ define amdgpu_kernel void @private_system_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("one-as") acquire
@@ -13411,45 +14096,58 @@ define amdgpu_kernel void @private_system_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("one-as") acq_rel
@@ -13654,45 +14352,58 @@ define amdgpu_kernel void @private_system_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("one-as") seq_cst
@@ -13927,42 +14638,56 @@ define amdgpu_kernel void @private_system_one_as_monotonic_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -14197,45 +14922,59 @@ define amdgpu_kernel void @private_system_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -14470,47 +15209,61 @@ define amdgpu_kernel void @private_system_one_as_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -14745,50 +15498,64 @@ define amdgpu_kernel void @private_system_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15023,50 +15790,64 @@ define amdgpu_kernel void @private_system_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15301,45 +16082,59 @@ define amdgpu_kernel void @private_system_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15574,45 +16369,59 @@ define amdgpu_kernel void @private_system_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15847,50 +16656,64 @@ define amdgpu_kernel void @private_system_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16125,50 +16948,64 @@ define amdgpu_kernel void @private_system_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16403,50 +17240,64 @@ define amdgpu_kernel void @private_system_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16681,50 +17532,64 @@ define amdgpu_kernel void @private_system_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_monotonic_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16959,50 +17824,64 @@ define amdgpu_kernel void @private_system_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acquire_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17237,50 +18116,64 @@ define amdgpu_kernel void @private_system_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_release_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_release_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_release_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17515,50 +18408,64 @@ define amdgpu_kernel void @private_system_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acq_rel_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17793,50 +18700,64 @@ define amdgpu_kernel void @private_system_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3] scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18099,44 +19020,59 @@ define amdgpu_kernel void @private_system_one_as_monotonic_monotonic_ret_cmpxchg
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18401,46 +19337,61 @@ define amdgpu_kernel void @private_system_one_as_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18705,49 +19656,64 @@ define amdgpu_kernel void @private_system_one_as_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_release_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_release_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_release_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19012,51 +19978,66 @@ define amdgpu_kernel void @private_system_one_as_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19321,51 +20302,66 @@ define amdgpu_kernel void @private_system_one_as_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19630,46 +20626,61 @@ define amdgpu_kernel void @private_system_one_as_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19934,46 +20945,61 @@ define amdgpu_kernel void @private_system_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20238,51 +21264,66 @@ define amdgpu_kernel void @private_system_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20547,51 +21588,66 @@ define amdgpu_kernel void @private_system_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20856,51 +21912,66 @@ define amdgpu_kernel void @private_system_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21165,51 +22236,66 @@ define amdgpu_kernel void @private_system_one_as_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21474,51 +22560,66 @@ define amdgpu_kernel void @private_system_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21783,51 +22884,66 @@ define amdgpu_kernel void @private_system_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_release_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22092,51 +23208,66 @@ define amdgpu_kernel void @private_system_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22401,51 +23532,66 @@ define amdgpu_kernel void @private_system_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_system_one_as_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: global_wb scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: global_inv scope:SCOPE_SYS
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_system_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_system_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: global_wb scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: global_inv scope:SCOPE_SYS
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22454,3 +23600,5 @@ entry:
store i32 %val0, ptr addrspace(5) %out, align 4
ret void
}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX1250: {{.*}}
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll
index 2e9b915721a4e..e35c418ea1488 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll
@@ -8,7 +8,8 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1100 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX11-CU %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 < %s | FileCheck --check-prefixes=GFX12-WGP %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX12-CU %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250,GFX1250-NOGAS %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 -mattr=+globally-addressable-scratch < %s | FileCheck --check-prefixes=GFX1250,GFX1250-GAS %s
define amdgpu_kernel void @private_volatile_load_0(
; GFX6-LABEL: private_volatile_load_0:
@@ -727,3 +728,6 @@ entry:
}
declare i32 @llvm.amdgcn.workitem.id.x()
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX1250-GAS: {{.*}}
+; GFX1250-NOGAS: {{.*}}
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll
index 2bcb47b49d74e..4e8bd3cd42bee 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll
@@ -12,7 +12,8 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1100 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX11-CU %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 < %s | FileCheck --check-prefixes=GFX12-WGP %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX12-CU %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250,GFX1250-NOGAS %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 -mattr=+globally-addressable-scratch < %s | FileCheck --check-prefixes=GFX1250,GFX1250-GAS %s
define amdgpu_kernel void @private_wavefront_unordered_load(
; GFX6-LABEL: private_wavefront_unordered_load:
@@ -177,36 +178,47 @@ define amdgpu_kernel void @private_wavefront_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("wavefront") unordered, align 4
@@ -377,36 +389,47 @@ define amdgpu_kernel void @private_wavefront_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("wavefront") monotonic, align 4
@@ -577,36 +600,47 @@ define amdgpu_kernel void @private_wavefront_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("wavefront") acquire, align 4
@@ -777,36 +811,47 @@ define amdgpu_kernel void @private_wavefront_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("wavefront") seq_cst, align 4
@@ -957,36 +1002,46 @@ define amdgpu_kernel void @private_wavefront_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("wavefront") unordered, align 4
@@ -1136,36 +1191,46 @@ define amdgpu_kernel void @private_wavefront_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("wavefront") monotonic, align 4
@@ -1315,36 +1380,46 @@ define amdgpu_kernel void @private_wavefront_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("wavefront") release, align 4
@@ -1494,36 +1569,46 @@ define amdgpu_kernel void @private_wavefront_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("wavefront") seq_cst, align 4
@@ -1673,36 +1758,46 @@ define amdgpu_kernel void @private_wavefront_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront") monotonic
@@ -1852,36 +1947,46 @@ define amdgpu_kernel void @private_wavefront_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront") acquire
@@ -2031,36 +2136,46 @@ define amdgpu_kernel void @private_wavefront_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront") release
@@ -2210,36 +2325,46 @@ define amdgpu_kernel void @private_wavefront_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront") acq_rel
@@ -2389,36 +2514,46 @@ define amdgpu_kernel void @private_wavefront_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront") seq_cst
@@ -2622,38 +2757,51 @@ define amdgpu_kernel void @private_wavefront_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront") acquire
@@ -2858,38 +3006,51 @@ define amdgpu_kernel void @private_wavefront_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront") acq_rel
@@ -3094,38 +3255,51 @@ define amdgpu_kernel void @private_wavefront_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront") seq_cst
@@ -3360,42 +3534,56 @@ define amdgpu_kernel void @private_wavefront_monotonic_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3630,42 +3818,56 @@ define amdgpu_kernel void @private_wavefront_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3900,42 +4102,56 @@ define amdgpu_kernel void @private_wavefront_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4170,42 +4386,56 @@ define amdgpu_kernel void @private_wavefront_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4440,42 +4670,56 @@ define amdgpu_kernel void @private_wavefront_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4710,42 +4954,56 @@ define amdgpu_kernel void @private_wavefront_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4980,42 +5238,56 @@ define amdgpu_kernel void @private_wavefront_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5250,42 +5522,56 @@ define amdgpu_kernel void @private_wavefront_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5520,42 +5806,56 @@ define amdgpu_kernel void @private_wavefront_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5790,42 +6090,56 @@ define amdgpu_kernel void @private_wavefront_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6060,42 +6374,56 @@ define amdgpu_kernel void @private_wavefront_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_monotonic_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_monotonic_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_monotonic_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6330,42 +6658,56 @@ define amdgpu_kernel void @private_wavefront_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acquire_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acquire_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acquire_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6600,42 +6942,56 @@ define amdgpu_kernel void @private_wavefront_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_release_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_release_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_release_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6870,42 +7226,56 @@ define amdgpu_kernel void @private_wavefront_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acq_rel_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acq_rel_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acq_rel_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7140,42 +7510,56 @@ define amdgpu_kernel void @private_wavefront_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7438,44 +7822,59 @@ define amdgpu_kernel void @private_wavefront_monotonic_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7740,44 +8139,59 @@ define amdgpu_kernel void @private_wavefront_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8042,44 +8456,59 @@ define amdgpu_kernel void @private_wavefront_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_release_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_release_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_release_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8344,44 +8773,59 @@ define amdgpu_kernel void @private_wavefront_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8646,44 +9090,59 @@ define amdgpu_kernel void @private_wavefront_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8948,44 +9407,59 @@ define amdgpu_kernel void @private_wavefront_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9250,44 +9724,59 @@ define amdgpu_kernel void @private_wavefront_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9552,44 +10041,59 @@ define amdgpu_kernel void @private_wavefront_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9854,44 +10358,59 @@ define amdgpu_kernel void @private_wavefront_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10156,44 +10675,59 @@ define amdgpu_kernel void @private_wavefront_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10458,44 +10992,59 @@ define amdgpu_kernel void @private_wavefront_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10760,44 +11309,59 @@ define amdgpu_kernel void @private_wavefront_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11062,44 +11626,59 @@ define amdgpu_kernel void @private_wavefront_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_release_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_release_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_release_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11364,44 +11943,59 @@ define amdgpu_kernel void @private_wavefront_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11666,44 +12260,59 @@ define amdgpu_kernel void @private_wavefront_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11876,36 +12485,47 @@ define amdgpu_kernel void @private_wavefront_one_as_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("wavefront-one-as") unordered, align 4
@@ -12076,36 +12696,47 @@ define amdgpu_kernel void @private_wavefront_one_as_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("wavefront-one-as") monotonic, align 4
@@ -12276,36 +12907,47 @@ define amdgpu_kernel void @private_wavefront_one_as_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("wavefront-one-as") acquire, align 4
@@ -12476,36 +13118,47 @@ define amdgpu_kernel void @private_wavefront_one_as_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("wavefront-one-as") seq_cst, align 4
@@ -12656,36 +13309,46 @@ define amdgpu_kernel void @private_wavefront_one_as_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("wavefront-one-as") unordered, align 4
@@ -12835,36 +13498,46 @@ define amdgpu_kernel void @private_wavefront_one_as_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("wavefront-one-as") monotonic, align 4
@@ -13014,36 +13687,46 @@ define amdgpu_kernel void @private_wavefront_one_as_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("wavefront-one-as") release, align 4
@@ -13193,36 +13876,46 @@ define amdgpu_kernel void @private_wavefront_one_as_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("wavefront-one-as") seq_cst, align 4
@@ -13372,36 +14065,46 @@ define amdgpu_kernel void @private_wavefront_one_as_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront-one-as") monotonic
@@ -13551,36 +14254,46 @@ define amdgpu_kernel void @private_wavefront_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront-one-as") acquire
@@ -13730,36 +14443,46 @@ define amdgpu_kernel void @private_wavefront_one_as_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront-one-as") release
@@ -13909,36 +14632,46 @@ define amdgpu_kernel void @private_wavefront_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront-one-as") acq_rel
@@ -14088,36 +14821,46 @@ define amdgpu_kernel void @private_wavefront_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront-one-as") seq_cst
@@ -14321,38 +15064,51 @@ define amdgpu_kernel void @private_wavefront_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront-one-as") acquire
@@ -14557,38 +15313,51 @@ define amdgpu_kernel void @private_wavefront_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront-one-as") acq_rel
@@ -14793,38 +15562,51 @@ define amdgpu_kernel void @private_wavefront_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("wavefront-one-as") seq_cst
@@ -15059,42 +15841,56 @@ define amdgpu_kernel void @private_wavefront_one_as_monotonic_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15329,42 +16125,56 @@ define amdgpu_kernel void @private_wavefront_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15599,42 +16409,56 @@ define amdgpu_kernel void @private_wavefront_one_as_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15869,42 +16693,56 @@ define amdgpu_kernel void @private_wavefront_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16139,42 +16977,56 @@ define amdgpu_kernel void @private_wavefront_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16409,42 +17261,56 @@ define amdgpu_kernel void @private_wavefront_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16679,42 +17545,56 @@ define amdgpu_kernel void @private_wavefront_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16949,42 +17829,56 @@ define amdgpu_kernel void @private_wavefront_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17219,42 +18113,56 @@ define amdgpu_kernel void @private_wavefront_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17489,42 +18397,56 @@ define amdgpu_kernel void @private_wavefront_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17759,42 +18681,56 @@ define amdgpu_kernel void @private_wavefront_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_monotonic_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18029,42 +18965,56 @@ define amdgpu_kernel void @private_wavefront_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acquire_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18299,42 +19249,56 @@ define amdgpu_kernel void @private_wavefront_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_release_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_release_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_release_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18569,42 +19533,56 @@ define amdgpu_kernel void @private_wavefront_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18839,42 +19817,56 @@ define amdgpu_kernel void @private_wavefront_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19137,44 +20129,59 @@ define amdgpu_kernel void @private_wavefront_one_as_monotonic_monotonic_ret_cmpx
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19439,44 +20446,59 @@ define amdgpu_kernel void @private_wavefront_one_as_acquire_monotonic_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19741,44 +20763,59 @@ define amdgpu_kernel void @private_wavefront_one_as_release_monotonic_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_release_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_release_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_release_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20043,44 +21080,59 @@ define amdgpu_kernel void @private_wavefront_one_as_acq_rel_monotonic_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20345,44 +21397,59 @@ define amdgpu_kernel void @private_wavefront_one_as_seq_cst_monotonic_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20647,44 +21714,59 @@ define amdgpu_kernel void @private_wavefront_one_as_monotonic_acquire_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20949,44 +22031,59 @@ define amdgpu_kernel void @private_wavefront_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21251,44 +22348,59 @@ define amdgpu_kernel void @private_wavefront_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21553,44 +22665,59 @@ define amdgpu_kernel void @private_wavefront_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21855,44 +22982,59 @@ define amdgpu_kernel void @private_wavefront_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22157,44 +23299,59 @@ define amdgpu_kernel void @private_wavefront_one_as_monotonic_seq_cst_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22459,44 +23616,59 @@ define amdgpu_kernel void @private_wavefront_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22761,44 +23933,59 @@ define amdgpu_kernel void @private_wavefront_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_release_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23063,44 +24250,59 @@ define amdgpu_kernel void @private_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23365,44 +24567,59 @@ define amdgpu_kernel void @private_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23411,3 +24628,5 @@ entry:
store i32 %val0, ptr addrspace(5) %out, align 4
ret void
}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX1250: {{.*}}
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll
index c3b2d44cfae2a..9c5a3891a2567 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll
@@ -12,7 +12,8 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1100 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX11-CU %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 < %s | FileCheck --check-prefixes=GFX12-WGP %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX12-CU %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck --check-prefixes=GFX1250,GFX1250-NOGAS %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 -mattr=+globally-addressable-scratch < %s | FileCheck --check-prefixes=GFX1250,GFX1250-GAS %s
define amdgpu_kernel void @private_workgroup_unordered_load(
; GFX6-LABEL: private_workgroup_unordered_load:
@@ -177,36 +178,47 @@ define amdgpu_kernel void @private_workgroup_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("workgroup") unordered, align 4
@@ -377,36 +389,47 @@ define amdgpu_kernel void @private_workgroup_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("workgroup") monotonic, align 4
@@ -577,36 +600,47 @@ define amdgpu_kernel void @private_workgroup_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("workgroup") acquire, align 4
@@ -777,38 +811,49 @@ define amdgpu_kernel void @private_workgroup_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("workgroup") seq_cst, align 4
@@ -959,36 +1004,46 @@ define amdgpu_kernel void @private_workgroup_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("workgroup") unordered, align 4
@@ -1138,36 +1193,46 @@ define amdgpu_kernel void @private_workgroup_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("workgroup") monotonic, align 4
@@ -1317,38 +1382,48 @@ define amdgpu_kernel void @private_workgroup_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("workgroup") release, align 4
@@ -1498,38 +1573,48 @@ define amdgpu_kernel void @private_workgroup_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("workgroup") seq_cst, align 4
@@ -1679,36 +1764,46 @@ define amdgpu_kernel void @private_workgroup_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup") monotonic
@@ -1858,37 +1953,47 @@ define amdgpu_kernel void @private_workgroup_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup") acquire
@@ -2038,38 +2143,48 @@ define amdgpu_kernel void @private_workgroup_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup") release
@@ -2219,39 +2334,49 @@ define amdgpu_kernel void @private_workgroup_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup") acq_rel
@@ -2401,39 +2526,49 @@ define amdgpu_kernel void @private_workgroup_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup") seq_cst
@@ -2637,38 +2772,51 @@ define amdgpu_kernel void @private_workgroup_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup") acquire
@@ -2873,40 +3021,53 @@ define amdgpu_kernel void @private_workgroup_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup") acq_rel
@@ -3111,40 +3272,53 @@ define amdgpu_kernel void @private_workgroup_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup") seq_cst
@@ -3379,42 +3553,56 @@ define amdgpu_kernel void @private_workgroup_monotonic_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3649,43 +3837,57 @@ define amdgpu_kernel void @private_workgroup_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -3920,44 +4122,58 @@ define amdgpu_kernel void @private_workgroup_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4192,45 +4408,59 @@ define amdgpu_kernel void @private_workgroup_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4465,45 +4695,59 @@ define amdgpu_kernel void @private_workgroup_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -4738,43 +4982,57 @@ define amdgpu_kernel void @private_workgroup_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5009,43 +5267,57 @@ define amdgpu_kernel void @private_workgroup_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5280,45 +5552,59 @@ define amdgpu_kernel void @private_workgroup_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5553,45 +5839,59 @@ define amdgpu_kernel void @private_workgroup_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -5826,45 +6126,59 @@ define amdgpu_kernel void @private_workgroup_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6099,45 +6413,59 @@ define amdgpu_kernel void @private_workgroup_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_monotonic_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_monotonic_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_monotonic_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6372,45 +6700,59 @@ define amdgpu_kernel void @private_workgroup_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acquire_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acquire_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acquire_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6645,45 +6987,59 @@ define amdgpu_kernel void @private_workgroup_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_release_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_release_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_release_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -6918,45 +7274,59 @@ define amdgpu_kernel void @private_workgroup_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acq_rel_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acq_rel_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acq_rel_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7191,45 +7561,59 @@ define amdgpu_kernel void @private_workgroup_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt_dscnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7492,44 +7876,59 @@ define amdgpu_kernel void @private_workgroup_monotonic_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -7794,44 +8193,59 @@ define amdgpu_kernel void @private_workgroup_acquire_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8096,46 +8510,61 @@ define amdgpu_kernel void @private_workgroup_release_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_release_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_release_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_release_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8400,46 +8829,61 @@ define amdgpu_kernel void @private_workgroup_acq_rel_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -8704,46 +9148,61 @@ define amdgpu_kernel void @private_workgroup_seq_cst_monotonic_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9008,44 +9467,59 @@ define amdgpu_kernel void @private_workgroup_monotonic_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9310,44 +9784,59 @@ define amdgpu_kernel void @private_workgroup_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9612,46 +10101,61 @@ define amdgpu_kernel void @private_workgroup_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -9916,46 +10420,61 @@ define amdgpu_kernel void @private_workgroup_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10220,46 +10739,61 @@ define amdgpu_kernel void @private_workgroup_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10524,46 +11058,61 @@ define amdgpu_kernel void @private_workgroup_monotonic_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -10828,46 +11377,61 @@ define amdgpu_kernel void @private_workgroup_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11132,46 +11696,61 @@ define amdgpu_kernel void @private_workgroup_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_release_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_release_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_release_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11436,46 +12015,61 @@ define amdgpu_kernel void @private_workgroup_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11740,46 +12334,61 @@ define amdgpu_kernel void @private_workgroup_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -11952,36 +12561,47 @@ define amdgpu_kernel void @private_workgroup_one_as_unordered_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_unordered_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_unordered_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_unordered_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("workgroup-one-as") unordered, align 4
@@ -12152,36 +12772,47 @@ define amdgpu_kernel void @private_workgroup_one_as_monotonic_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_monotonic_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_monotonic_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_monotonic_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("workgroup-one-as") monotonic, align 4
@@ -12352,36 +12983,47 @@ define amdgpu_kernel void @private_workgroup_one_as_acquire_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acquire_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acquire_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acquire_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("workgroup-one-as") acquire, align 4
@@ -12552,38 +13194,49 @@ define amdgpu_kernel void @private_workgroup_one_as_seq_cst_load(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_seq_cst_load:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: flat_load_b32 v0, v[0:1]
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_seq_cst_load:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s1
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_seq_cst_load:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: flat_load_b32 v0, v[0:1]
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %in, ptr addrspace(5) %out) {
entry:
%val = load atomic i32, ptr addrspace(5) %in syncscope("workgroup-one-as") seq_cst, align 4
@@ -12734,36 +13387,46 @@ define amdgpu_kernel void @private_workgroup_one_as_unordered_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_unordered_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_unordered_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_unordered_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("workgroup-one-as") unordered, align 4
@@ -12913,36 +13576,46 @@ define amdgpu_kernel void @private_workgroup_one_as_monotonic_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_monotonic_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_monotonic_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_monotonic_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("workgroup-one-as") monotonic, align 4
@@ -13092,38 +13765,48 @@ define amdgpu_kernel void @private_workgroup_one_as_release_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_release_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_release_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_release_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("workgroup-one-as") release, align 4
@@ -13273,38 +13956,48 @@ define amdgpu_kernel void @private_workgroup_one_as_seq_cst_store(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_seq_cst_store:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_store_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_seq_cst_store:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_seq_cst_store:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_store_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
i32 %in, ptr addrspace(5) %out) {
entry:
store atomic i32 %in, ptr addrspace(5) %out syncscope("workgroup-one-as") seq_cst, align 4
@@ -13454,36 +14147,46 @@ define amdgpu_kernel void @private_workgroup_one_as_monotonic_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_monotonic_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_monotonic_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_monotonic_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup-one-as") monotonic
@@ -13633,37 +14336,47 @@ define amdgpu_kernel void @private_workgroup_one_as_acquire_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acquire_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acquire_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acquire_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup-one-as") acquire
@@ -13813,38 +14526,48 @@ define amdgpu_kernel void @private_workgroup_one_as_release_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_release_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_release_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_release_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup-one-as") release
@@ -13994,39 +14717,49 @@ define amdgpu_kernel void @private_workgroup_one_as_acq_rel_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acq_rel_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acq_rel_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acq_rel_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup-one-as") acq_rel
@@ -14176,39 +14909,49 @@ define amdgpu_kernel void @private_workgroup_one_as_seq_cst_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_seq_cst_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s1, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
-; GFX1250-NEXT: s_mov_b32 s1, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s1, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s2
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s1, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s2, s3
-; GFX1250-NEXT: s_cselect_b32 s2, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s1, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s0
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v[0:1], v2
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_seq_cst_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v0, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_seq_cst_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s1
+; GFX1250-GAS-NEXT: s_mov_b32 s1, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s1, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s2, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s2, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s1, v2, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s1, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s1, v0, s2
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s0
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v[0:1], v2
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup-one-as") seq_cst
@@ -14412,38 +15155,51 @@ define amdgpu_kernel void @private_workgroup_one_as_acquire_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acquire_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acquire_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acquire_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup-one-as") acquire
@@ -14648,40 +15404,53 @@ define amdgpu_kernel void @private_workgroup_one_as_acq_rel_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acq_rel_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acq_rel_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup-one-as") acq_rel
@@ -14886,40 +15655,53 @@ define amdgpu_kernel void @private_workgroup_one_as_seq_cst_ret_atomicrmw(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_seq_cst_ret_atomicrmw:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s0
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s3, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s0, s3
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0
+; GFX1250-NOGAS-NEXT: v_mov_b32_e32 v1, s1
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_seq_cst_ret_atomicrmw:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[2:3], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[2:3]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s3, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s0, s3
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_swap_b32 v0, v[0:1], v2 th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg ptr addrspace(5) %out, i32 %in syncscope("workgroup-one-as") seq_cst
@@ -15154,42 +15936,56 @@ define amdgpu_kernel void @private_workgroup_one_as_monotonic_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_monotonic_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_monotonic_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15424,43 +16220,57 @@ define amdgpu_kernel void @private_workgroup_one_as_acquire_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acquire_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acquire_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15695,44 +16505,58 @@ define amdgpu_kernel void @private_workgroup_one_as_release_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_release_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_release_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_release_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -15967,45 +16791,59 @@ define amdgpu_kernel void @private_workgroup_one_as_acq_rel_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acq_rel_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acq_rel_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16240,45 +17078,59 @@ define amdgpu_kernel void @private_workgroup_one_as_seq_cst_monotonic_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_seq_cst_monotonic_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_seq_cst_monotonic_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16513,43 +17365,57 @@ define amdgpu_kernel void @private_workgroup_one_as_monotonic_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_monotonic_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_monotonic_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -16784,43 +17650,57 @@ define amdgpu_kernel void @private_workgroup_one_as_acquire_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acquire_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acquire_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acquire_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17055,45 +17935,59 @@ define amdgpu_kernel void @private_workgroup_one_as_release_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_release_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_release_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_release_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17328,45 +18222,59 @@ define amdgpu_kernel void @private_workgroup_one_as_acq_rel_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acq_rel_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acq_rel_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17601,45 +18509,59 @@ define amdgpu_kernel void @private_workgroup_one_as_seq_cst_acquire_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_seq_cst_acquire_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_seq_cst_acquire_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -17874,45 +18796,59 @@ define amdgpu_kernel void @private_workgroup_one_as_monotonic_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_monotonic_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_monotonic_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18147,45 +19083,59 @@ define amdgpu_kernel void @private_workgroup_one_as_acquire_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acquire_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acquire_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18420,45 +19370,59 @@ define amdgpu_kernel void @private_workgroup_one_as_release_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_release_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_release_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_release_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18693,45 +19657,59 @@ define amdgpu_kernel void @private_workgroup_one_as_acq_rel_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -18966,45 +19944,59 @@ define amdgpu_kernel void @private_workgroup_one_as_seq_cst_seq_cst_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0 offset:16
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s3, s2, s3
-; GFX1250-NEXT: s_mov_b32 s2, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
-; GFX1250-NEXT: s_mov_b32 s2, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s2, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s3
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[4:5], 0
-; GFX1250-NEXT: s_mov_b32 s2, s5
-; GFX1250-NEXT: s_mov_b32 s6, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s3, s6
-; GFX1250-NEXT: s_cselect_b32 s3, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s2, s4
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s1
-; GFX1250-NEXT: v_mov_b32_e32 v4, s0
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v0, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s3, s2, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s2
+; GFX1250-GAS-NEXT: s_mov_b32 s2, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s2, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s3
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[4:5]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[4:5], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s5
+; GFX1250-GAS-NEXT: s_mov_b32 s6, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s3, s6
+; GFX1250-GAS-NEXT: s_cselect_b32 s3, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s2, v2, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s2, s4
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s2, v0, s3
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s1
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s0
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v[0:1], v[2:3]
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19267,44 +20259,59 @@ define amdgpu_kernel void @private_workgroup_one_as_monotonic_monotonic_ret_cmpx
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_monotonic_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19569,44 +20576,59 @@ define amdgpu_kernel void @private_workgroup_one_as_acquire_monotonic_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acquire_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -19871,46 +20893,61 @@ define amdgpu_kernel void @private_workgroup_one_as_release_monotonic_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_release_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_release_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_release_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20175,46 +21212,61 @@ define amdgpu_kernel void @private_workgroup_one_as_acq_rel_monotonic_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20479,46 +21531,61 @@ define amdgpu_kernel void @private_workgroup_one_as_seq_cst_monotonic_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -20783,44 +21850,59 @@ define amdgpu_kernel void @private_workgroup_one_as_monotonic_acquire_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_monotonic_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21085,44 +22167,59 @@ define amdgpu_kernel void @private_workgroup_one_as_acquire_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acquire_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acquire_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21387,46 +22484,61 @@ define amdgpu_kernel void @private_workgroup_one_as_release_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_release_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_release_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21691,46 +22803,61 @@ define amdgpu_kernel void @private_workgroup_one_as_acq_rel_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acq_rel_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -21995,46 +23122,61 @@ define amdgpu_kernel void @private_workgroup_one_as_seq_cst_acquire_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_seq_cst_acquire_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22299,46 +23441,61 @@ define amdgpu_kernel void @private_workgroup_one_as_monotonic_seq_cst_ret_cmpxch
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22603,46 +23760,61 @@ define amdgpu_kernel void @private_workgroup_one_as_acquire_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acquire_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -22907,46 +24079,61 @@ define amdgpu_kernel void @private_workgroup_one_as_release_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_release_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_release_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23211,46 +24398,61 @@ define amdgpu_kernel void @private_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23515,46 +24717,61 @@ define amdgpu_kernel void @private_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg(
; GFX12-CU-NEXT: scratch_store_b32 off, v0, s0
; GFX12-CU-NEXT: s_endpgm
;
-; GFX1250-LABEL: private_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg:
-; GFX1250: ; %bb.0: ; %entry
-; GFX1250-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
-; GFX1250-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
-; GFX1250-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
-; GFX1250-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
-; GFX1250-NEXT: s_mov_b32 s3, 16
-; GFX1250-NEXT: s_wait_kmcnt 0x0
-; GFX1250-NEXT: s_add_co_i32 s4, s0, s3
-; GFX1250-NEXT: s_mov_b32 s3, 0
-; GFX1250-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
-; GFX1250-NEXT: s_mov_b32 s3, 20
-; GFX1250-NEXT: v_lshlrev_b32_e64 v2, s3, v0
-; GFX1250-NEXT: v_mov_b32_e32 v0, s4
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
-; GFX1250-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
-; GFX1250-NEXT: v_mov_b32_e32 v2, v1
-; GFX1250-NEXT: s_mov_b64 s[6:7], 0
-; GFX1250-NEXT: s_mov_b32 s3, s7
-; GFX1250-NEXT: s_mov_b32 s5, -1
-; GFX1250-NEXT: s_cmp_lg_u32 s4, s5
-; GFX1250-NEXT: s_cselect_b32 s4, -1, 0
-; GFX1250-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: s_mov_b32 s3, s6
-; GFX1250-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
-; GFX1250-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v1, v2
-; GFX1250-NEXT: v_mov_b32_e32 v2, s2
-; GFX1250-NEXT: v_mov_b32_e32 v4, s1
-; GFX1250-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
-; GFX1250-NEXT: v_mov_b32_e32 v3, v4
-; GFX1250-NEXT: s_wait_loadcnt 0x0
-; GFX1250-NEXT: s_wait_storecnt 0x0
-; GFX1250-NEXT: s_wait_xcnt 0x0
-; GFX1250-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
-; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
-; GFX1250-NEXT: scratch_store_b32 off, v0, s0
-; GFX1250-NEXT: s_endpgm
+; GFX1250-NOGAS-LABEL: private_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-NOGAS: ; %bb.0: ; %entry
+; GFX1250-NOGAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-NOGAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s1, s[4:5], 0x4 nv
+; GFX1250-NOGAS-NEXT: s_load_b32 s2, s[4:5], 0x8 nv
+; GFX1250-NOGAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NOGAS-NEXT: scratch_load_b32 v0, off, s0 offset:16
+; GFX1250-NOGAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-NOGAS-NEXT: v_cmp_eq_u32_e64 s2, v0, s2
+; GFX1250-NOGAS-NEXT: v_cndmask_b32_e64 v1, v0, s1, s2
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v1, s0 offset:16
+; GFX1250-NOGAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-NOGAS-NEXT: s_endpgm
+;
+; GFX1250-GAS-LABEL: private_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX1250-GAS: ; %bb.0: ; %entry
+; GFX1250-GAS-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 1 ; msbs: dst=0 src0=0 src1=0 src2=0
+; GFX1250-GAS-NEXT: s_load_b32 s0, s[4:5], 0x0 nv
+; GFX1250-GAS-NEXT: s_load_b32 s2, s[4:5], 0x4 nv
+; GFX1250-GAS-NEXT: s_load_b32 s1, s[4:5], 0x8 nv
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 16
+; GFX1250-GAS-NEXT: s_wait_kmcnt 0x0
+; GFX1250-GAS-NEXT: s_add_co_i32 s4, s0, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 0
+; GFX1250-GAS-NEXT: v_mbcnt_lo_u32_b32 v0, -1, s3
+; GFX1250-GAS-NEXT: s_mov_b32 s3, 20
+; GFX1250-GAS-NEXT: v_lshlrev_b32_e64 v2, s3, v0
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v0, s4
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], src_flat_scratch_base_lo
+; GFX1250-GAS-NEXT: v_add_nc_u64_e64 v[0:1], v[0:1], s[6:7]
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, v1
+; GFX1250-GAS-NEXT: s_mov_b64 s[6:7], 0
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s7
+; GFX1250-GAS-NEXT: s_mov_b32 s5, -1
+; GFX1250-GAS-NEXT: s_cmp_lg_u32 s4, s5
+; GFX1250-GAS-NEXT: s_cselect_b32 s4, -1, 0
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v2, s3, v2, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: s_mov_b32 s3, s6
+; GFX1250-GAS-NEXT: v_cndmask_b32_e64 v0, s3, v0, s4
+; GFX1250-GAS-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v1, v2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v2, s2
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v4, s1
+; GFX1250-GAS-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
+; GFX1250-GAS-NEXT: v_mov_b32_e32 v3, v4
+; GFX1250-GAS-NEXT: s_wait_loadcnt 0x0
+; GFX1250-GAS-NEXT: s_wait_storecnt 0x0
+; GFX1250-GAS-NEXT: s_wait_xcnt 0x0
+; GFX1250-GAS-NEXT: flat_atomic_cmpswap_b32 v0, v[0:1], v[2:3] th:TH_ATOMIC_RETURN
+; GFX1250-GAS-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-GAS-NEXT: scratch_store_b32 off, v0, s0
+; GFX1250-GAS-NEXT: s_endpgm
ptr addrspace(5) %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, ptr addrspace(5) %out, i32 4
@@ -23563,3 +24780,5 @@ entry:
store i32 %val0, ptr addrspace(5) %out, align 4
ret void
}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX1250: {{.*}}
diff --git a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-private-gas.ll b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-private-gas.ll
index 2cb805ba873a8..00c29b5875217 100644
--- a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-private-gas.ll
+++ b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-private-gas.ll
@@ -1,6 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=GFX1200 %s
-; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=GFX1250 %s
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=GFX1250,GFX1250-NOGAS %s
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -mattr=+globally-addressable-scratch -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=GFX1250,GFX1250-GAS %s
define void @system_atomic_store_unordered_float(ptr addrspace(5) %addr, float %val) {
; GFX1200-LABEL: define void @system_atomic_store_unordered_float(
@@ -8,11 +9,16 @@ define void @system_atomic_store_unordered_float(ptr addrspace(5) %addr, float %
; GFX1200-NEXT: store float [[VAL]], ptr addrspace(5) [[ADDR]], align 4
; GFX1200-NEXT: ret void
;
-; GFX1250-LABEL: define void @system_atomic_store_unordered_float(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]], float [[VAL:%.*]]) #[[ATTR0:[0-9]+]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: store atomic float [[VAL]], ptr [[SCRATCH_ASCAST]] unordered, align 4
-; GFX1250-NEXT: ret void
+; GFX1250-NOGAS-LABEL: define void @system_atomic_store_unordered_float(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]], float [[VAL:%.*]]) #[[ATTR0:[0-9]+]] {
+; GFX1250-NOGAS-NEXT: store float [[VAL]], ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: ret void
+;
+; GFX1250-GAS-LABEL: define void @system_atomic_store_unordered_float(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]], float [[VAL:%.*]]) #[[ATTR0:[0-9]+]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: store atomic float [[VAL]], ptr [[SCRATCH_ASCAST]] unordered, align 4
+; GFX1250-GAS-NEXT: ret void
;
store atomic float %val, ptr addrspace(5) %addr unordered, align 4
ret void
@@ -24,11 +30,16 @@ define void @system_atomic_store_unordered_i32(ptr addrspace(5) %addr, i32 %val)
; GFX1200-NEXT: store i32 [[VAL]], ptr addrspace(5) [[ADDR]], align 4
; GFX1200-NEXT: ret void
;
-; GFX1250-LABEL: define void @system_atomic_store_unordered_i32(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[VAL:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: store atomic i32 [[VAL]], ptr [[SCRATCH_ASCAST]] unordered, align 4
-; GFX1250-NEXT: ret void
+; GFX1250-NOGAS-LABEL: define void @system_atomic_store_unordered_i32(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[VAL:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: store i32 [[VAL]], ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: ret void
+;
+; GFX1250-GAS-LABEL: define void @system_atomic_store_unordered_i32(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[VAL:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: store atomic i32 [[VAL]], ptr [[SCRATCH_ASCAST]] unordered, align 4
+; GFX1250-GAS-NEXT: ret void
;
store atomic i32 %val, ptr addrspace(5) %addr unordered, align 4
ret void
@@ -40,11 +51,16 @@ define void @system_atomic_store_release_i32(ptr addrspace(5) %addr, i32 %val) {
; GFX1200-NEXT: store i32 [[VAL]], ptr addrspace(5) [[ADDR]], align 4
; GFX1200-NEXT: ret void
;
-; GFX1250-LABEL: define void @system_atomic_store_release_i32(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[VAL:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: store atomic i32 [[VAL]], ptr [[SCRATCH_ASCAST]] release, align 4
-; GFX1250-NEXT: ret void
+; GFX1250-NOGAS-LABEL: define void @system_atomic_store_release_i32(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[VAL:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: store i32 [[VAL]], ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: ret void
+;
+; GFX1250-GAS-LABEL: define void @system_atomic_store_release_i32(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[VAL:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: store atomic i32 [[VAL]], ptr [[SCRATCH_ASCAST]] release, align 4
+; GFX1250-GAS-NEXT: ret void
;
store atomic i32 %val, ptr addrspace(5) %addr release, align 4
ret void
@@ -56,11 +72,16 @@ define void @workgroup_atomic_store_release_i32(ptr addrspace(5) %addr, i32 %val
; GFX1200-NEXT: store i32 [[VAL]], ptr addrspace(5) [[ADDR]], align 4
; GFX1200-NEXT: ret void
;
-; GFX1250-LABEL: define void @workgroup_atomic_store_release_i32(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[VAL:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: store atomic i32 [[VAL]], ptr [[SCRATCH_ASCAST]] syncscope("workgroup") release, align 4
-; GFX1250-NEXT: ret void
+; GFX1250-NOGAS-LABEL: define void @workgroup_atomic_store_release_i32(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[VAL:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: store i32 [[VAL]], ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: ret void
+;
+; GFX1250-GAS-LABEL: define void @workgroup_atomic_store_release_i32(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[VAL:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: store atomic i32 [[VAL]], ptr [[SCRATCH_ASCAST]] syncscope("workgroup") release, align 4
+; GFX1250-GAS-NEXT: ret void
;
store atomic i32 %val, ptr addrspace(5) %addr syncscope("workgroup") release, align 4
ret void
@@ -72,11 +93,16 @@ define float @system_atomic_load_unordered_float(ptr addrspace(5) %addr) {
; GFX1200-NEXT: [[VAL:%.*]] = load float, ptr addrspace(5) [[ADDR]], align 4, !invariant.load [[META0:![0-9]+]], !nontemporal [[META1:![0-9]+]]
; GFX1200-NEXT: ret float [[VAL]]
;
-; GFX1250-LABEL: define float @system_atomic_load_unordered_float(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: [[VAL:%.*]] = load atomic float, ptr [[SCRATCH_ASCAST]] unordered, align 4, !invariant.load [[META0:![0-9]+]], !nontemporal [[META1:![0-9]+]]
-; GFX1250-NEXT: ret float [[VAL]]
+; GFX1250-NOGAS-LABEL: define float @system_atomic_load_unordered_float(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: [[VAL:%.*]] = load float, ptr addrspace(5) [[ADDR]], align 4, !invariant.load [[META0:![0-9]+]], !nontemporal [[META1:![0-9]+]]
+; GFX1250-NOGAS-NEXT: ret float [[VAL]]
+;
+; GFX1250-GAS-LABEL: define float @system_atomic_load_unordered_float(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: [[VAL:%.*]] = load atomic float, ptr [[SCRATCH_ASCAST]] unordered, align 4, !invariant.load [[META0:![0-9]+]], !nontemporal [[META1:![0-9]+]]
+; GFX1250-GAS-NEXT: ret float [[VAL]]
;
%val = load atomic float, ptr addrspace(5) %addr unordered, align 4, !invariant.load !1, !nontemporal !0
ret float %val
@@ -88,11 +114,16 @@ define i32 @system_atomic_load_unordered_i32(ptr addrspace(5) %addr) {
; GFX1200-NEXT: [[VAL:%.*]] = load i32, ptr addrspace(5) [[ADDR]], align 4
; GFX1200-NEXT: ret i32 [[VAL]]
;
-; GFX1250-LABEL: define i32 @system_atomic_load_unordered_i32(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: [[VAL:%.*]] = load atomic i32, ptr [[SCRATCH_ASCAST]] unordered, align 4
-; GFX1250-NEXT: ret i32 [[VAL]]
+; GFX1250-NOGAS-LABEL: define i32 @system_atomic_load_unordered_i32(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: [[VAL:%.*]] = load i32, ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: ret i32 [[VAL]]
+;
+; GFX1250-GAS-LABEL: define i32 @system_atomic_load_unordered_i32(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: [[VAL:%.*]] = load atomic i32, ptr [[SCRATCH_ASCAST]] unordered, align 4
+; GFX1250-GAS-NEXT: ret i32 [[VAL]]
;
%val = load atomic i32, ptr addrspace(5) %addr unordered, align 4
ret i32 %val
@@ -104,11 +135,16 @@ define i32 @system_atomic_load_acquire_i32(ptr addrspace(5) %addr) {
; GFX1200-NEXT: [[VAL:%.*]] = load i32, ptr addrspace(5) [[ADDR]], align 4
; GFX1200-NEXT: ret i32 [[VAL]]
;
-; GFX1250-LABEL: define i32 @system_atomic_load_acquire_i32(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: [[VAL:%.*]] = load atomic i32, ptr [[SCRATCH_ASCAST]] acquire, align 4
-; GFX1250-NEXT: ret i32 [[VAL]]
+; GFX1250-NOGAS-LABEL: define i32 @system_atomic_load_acquire_i32(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: [[VAL:%.*]] = load i32, ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: ret i32 [[VAL]]
+;
+; GFX1250-GAS-LABEL: define i32 @system_atomic_load_acquire_i32(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: [[VAL:%.*]] = load atomic i32, ptr [[SCRATCH_ASCAST]] acquire, align 4
+; GFX1250-GAS-NEXT: ret i32 [[VAL]]
;
%val = load atomic i32, ptr addrspace(5) %addr acquire, align 4
ret i32 %val
@@ -120,11 +156,16 @@ define i32 @workgroup_atomic_load_acquire_i32(ptr addrspace(5) %addr) {
; GFX1200-NEXT: [[VAL:%.*]] = load i32, ptr addrspace(5) [[ADDR]], align 4
; GFX1200-NEXT: ret i32 [[VAL]]
;
-; GFX1250-LABEL: define i32 @workgroup_atomic_load_acquire_i32(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: [[VAL:%.*]] = load atomic i32, ptr [[SCRATCH_ASCAST]] syncscope("workgroup") acquire, align 4
-; GFX1250-NEXT: ret i32 [[VAL]]
+; GFX1250-NOGAS-LABEL: define i32 @workgroup_atomic_load_acquire_i32(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: [[VAL:%.*]] = load i32, ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: ret i32 [[VAL]]
+;
+; GFX1250-GAS-LABEL: define i32 @workgroup_atomic_load_acquire_i32(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: [[VAL:%.*]] = load atomic i32, ptr [[SCRATCH_ASCAST]] syncscope("workgroup") acquire, align 4
+; GFX1250-GAS-NEXT: ret i32 [[VAL]]
;
%val = load atomic i32, ptr addrspace(5) %addr syncscope("workgroup") acquire, align 4
ret i32 %val
@@ -142,12 +183,23 @@ define i32 @system_atomic_cmpxchg_acq_rel_acquire_i32(ptr addrspace(5) %addr, i3
; GFX1200-NEXT: [[RES:%.*]] = extractvalue { i32, i1 } [[TMP5]], 0
; GFX1200-NEXT: ret i32 [[RES]]
;
-; GFX1250-LABEL: define i32 @system_atomic_cmpxchg_acq_rel_acquire_i32(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[OLD:%.*]], i32 [[IN:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: [[VAL:%.*]] = cmpxchg volatile ptr [[SCRATCH_ASCAST]], i32 [[OLD]], i32 [[IN]] acq_rel acquire, align 4, !nontemporal [[META1]]
-; GFX1250-NEXT: [[RES:%.*]] = extractvalue { i32, i1 } [[VAL]], 0
-; GFX1250-NEXT: ret i32 [[RES]]
+; GFX1250-NOGAS-LABEL: define i32 @system_atomic_cmpxchg_acq_rel_acquire_i32(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[OLD:%.*]], i32 [[IN:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: [[TMP1:%.*]] = load i32, ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: [[TMP2:%.*]] = icmp eq i32 [[TMP1]], [[OLD]]
+; GFX1250-NOGAS-NEXT: [[TMP3:%.*]] = select i1 [[TMP2]], i32 [[IN]], i32 [[TMP1]]
+; GFX1250-NOGAS-NEXT: store i32 [[TMP3]], ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: [[TMP4:%.*]] = insertvalue { i32, i1 } poison, i32 [[TMP1]], 0
+; GFX1250-NOGAS-NEXT: [[TMP5:%.*]] = insertvalue { i32, i1 } [[TMP4]], i1 [[TMP2]], 1
+; GFX1250-NOGAS-NEXT: [[RES:%.*]] = extractvalue { i32, i1 } [[TMP5]], 0
+; GFX1250-NOGAS-NEXT: ret i32 [[RES]]
+;
+; GFX1250-GAS-LABEL: define i32 @system_atomic_cmpxchg_acq_rel_acquire_i32(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[OLD:%.*]], i32 [[IN:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: [[VAL:%.*]] = cmpxchg volatile ptr [[SCRATCH_ASCAST]], i32 [[OLD]], i32 [[IN]] acq_rel acquire, align 4, !nontemporal [[META1]]
+; GFX1250-GAS-NEXT: [[RES:%.*]] = extractvalue { i32, i1 } [[VAL]], 0
+; GFX1250-GAS-NEXT: ret i32 [[RES]]
;
%val = cmpxchg volatile ptr addrspace(5) %addr, i32 %old, i32 %in acq_rel acquire, !nontemporal !0
%res = extractvalue { i32, i1 } %val, 0
@@ -161,11 +213,17 @@ define i32 @system_atomicrmw_xchg_acq_rel_i32(ptr addrspace(5) %addr, i32 %in) {
; GFX1200-NEXT: store i32 [[IN]], ptr addrspace(5) [[ADDR]], align 4
; GFX1200-NEXT: ret i32 [[TMP1]]
;
-; GFX1250-LABEL: define i32 @system_atomicrmw_xchg_acq_rel_i32(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[IN:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: [[VAL:%.*]] = atomicrmw volatile xchg ptr [[SCRATCH_ASCAST]], i32 [[IN]] acq_rel, align 4
-; GFX1250-NEXT: ret i32 [[VAL]]
+; GFX1250-NOGAS-LABEL: define i32 @system_atomicrmw_xchg_acq_rel_i32(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[IN:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: [[TMP1:%.*]] = load i32, ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: store i32 [[IN]], ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: ret i32 [[TMP1]]
+;
+; GFX1250-GAS-LABEL: define i32 @system_atomicrmw_xchg_acq_rel_i32(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i32 [[IN:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: [[VAL:%.*]] = atomicrmw volatile xchg ptr [[SCRATCH_ASCAST]], i32 [[IN]] acq_rel, align 4
+; GFX1250-GAS-NEXT: ret i32 [[VAL]]
;
%val = atomicrmw volatile xchg ptr addrspace(5) %addr, i32 %in acq_rel
ret i32 %val
@@ -178,11 +236,17 @@ define i16 @system_atomicrmw_xchg_acq_rel_i16(ptr addrspace(5) %addr, i16 %in) {
; GFX1200-NEXT: store i16 [[IN]], ptr addrspace(5) [[ADDR]], align 2
; GFX1200-NEXT: ret i16 [[TMP1]]
;
-; GFX1250-LABEL: define i16 @system_atomicrmw_xchg_acq_rel_i16(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]], i16 [[IN:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: [[VAL:%.*]] = atomicrmw volatile xchg ptr [[SCRATCH_ASCAST]], i16 [[IN]] acq_rel, align 2
-; GFX1250-NEXT: ret i16 [[VAL]]
+; GFX1250-NOGAS-LABEL: define i16 @system_atomicrmw_xchg_acq_rel_i16(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i16 [[IN:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: [[TMP1:%.*]] = load i16, ptr addrspace(5) [[ADDR]], align 2
+; GFX1250-NOGAS-NEXT: store i16 [[IN]], ptr addrspace(5) [[ADDR]], align 2
+; GFX1250-NOGAS-NEXT: ret i16 [[TMP1]]
+;
+; GFX1250-GAS-LABEL: define i16 @system_atomicrmw_xchg_acq_rel_i16(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]], i16 [[IN:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: [[VAL:%.*]] = atomicrmw volatile xchg ptr [[SCRATCH_ASCAST]], i16 [[IN]] acq_rel, align 2
+; GFX1250-GAS-NEXT: ret i16 [[VAL]]
;
%val = atomicrmw volatile xchg ptr addrspace(5) %addr, i16 %in acq_rel
ret i16 %val
@@ -196,11 +260,18 @@ define half @system_atomicrmw_fmax_acq_rel_half(ptr addrspace(5) %addr, half %in
; GFX1200-NEXT: store half [[TMP2]], ptr addrspace(5) [[ADDR]], align 2
; GFX1200-NEXT: ret half [[TMP1]]
;
-; GFX1250-LABEL: define half @system_atomicrmw_fmax_acq_rel_half(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]], half [[IN:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: [[VAL:%.*]] = atomicrmw volatile fmax ptr [[SCRATCH_ASCAST]], half [[IN]] acq_rel, align 2
-; GFX1250-NEXT: ret half [[VAL]]
+; GFX1250-NOGAS-LABEL: define half @system_atomicrmw_fmax_acq_rel_half(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]], half [[IN:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: [[TMP1:%.*]] = load half, ptr addrspace(5) [[ADDR]], align 2
+; GFX1250-NOGAS-NEXT: [[TMP2:%.*]] = call half @llvm.maxnum.f16(half [[TMP1]], half [[IN]])
+; GFX1250-NOGAS-NEXT: store half [[TMP2]], ptr addrspace(5) [[ADDR]], align 2
+; GFX1250-NOGAS-NEXT: ret half [[TMP1]]
+;
+; GFX1250-GAS-LABEL: define half @system_atomicrmw_fmax_acq_rel_half(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]], half [[IN:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: [[VAL:%.*]] = atomicrmw volatile fmax ptr [[SCRATCH_ASCAST]], half [[IN]] acq_rel, align 2
+; GFX1250-GAS-NEXT: ret half [[VAL]]
;
%val = atomicrmw volatile fmax ptr addrspace(5) %addr, half %in acq_rel
ret half %val
@@ -214,11 +285,18 @@ define float @system_atomicrmw_fminimum_acq_rel_float(ptr addrspace(5) %addr, fl
; GFX1200-NEXT: store float [[TMP2]], ptr addrspace(5) [[ADDR]], align 4
; GFX1200-NEXT: ret float [[TMP1]]
;
-; GFX1250-LABEL: define float @system_atomicrmw_fminimum_acq_rel_float(
-; GFX1250-SAME: ptr addrspace(5) [[ADDR:%.*]], float [[IN:%.*]]) #[[ATTR0]] {
-; GFX1250-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
-; GFX1250-NEXT: [[VAL:%.*]] = atomicrmw volatile fminimum ptr [[SCRATCH_ASCAST]], float [[IN]] acq_rel, align 4, !nontemporal [[META1]]
-; GFX1250-NEXT: ret float [[VAL]]
+; GFX1250-NOGAS-LABEL: define float @system_atomicrmw_fminimum_acq_rel_float(
+; GFX1250-NOGAS-SAME: ptr addrspace(5) [[ADDR:%.*]], float [[IN:%.*]]) #[[ATTR0]] {
+; GFX1250-NOGAS-NEXT: [[TMP1:%.*]] = load float, ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: [[TMP2:%.*]] = call float @llvm.minimum.f32(float [[TMP1]], float [[IN]])
+; GFX1250-NOGAS-NEXT: store float [[TMP2]], ptr addrspace(5) [[ADDR]], align 4
+; GFX1250-NOGAS-NEXT: ret float [[TMP1]]
+;
+; GFX1250-GAS-LABEL: define float @system_atomicrmw_fminimum_acq_rel_float(
+; GFX1250-GAS-SAME: ptr addrspace(5) [[ADDR:%.*]], float [[IN:%.*]]) #[[ATTR0]] {
+; GFX1250-GAS-NEXT: [[SCRATCH_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ADDR]] to ptr
+; GFX1250-GAS-NEXT: [[VAL:%.*]] = atomicrmw volatile fminimum ptr [[SCRATCH_ASCAST]], float [[IN]] acq_rel, align 4, !nontemporal [[META1]]
+; GFX1250-GAS-NEXT: ret float [[VAL]]
;
%val = atomicrmw volatile fminimum ptr addrspace(5) %addr, float %in acq_rel, !nontemporal !0
ret float %val
@@ -230,6 +308,11 @@ define float @system_atomicrmw_fminimum_acq_rel_float(ptr addrspace(5) %addr, fl
; GFX1200: [[META0]] = !{i32 1}
; GFX1200: [[META1]] = !{}
;.
-; GFX1250: [[META0]] = !{i32 1}
-; GFX1250: [[META1]] = !{}
+; GFX1250-NOGAS: [[META0]] = !{i32 1}
+; GFX1250-NOGAS: [[META1]] = !{}
+;.
+; GFX1250-GAS: [[META0]] = !{i32 1}
+; GFX1250-GAS: [[META1]] = !{}
;.
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX1250: {{.*}}
More information about the llvm-commits
mailing list