[clang] [llvm] [RFC][AMDGPU] Add OpenCL-specific fence address space masks (PR #78572)
Pierre van Houtryve via cfe-commits
cfe-commits at lists.llvm.org
Tue Apr 2 01:53:50 PDT 2024
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/78572
>From 05faf0092411301f41ce417c61c205e91bd467b5 Mon Sep 17 00:00:00 2001
From: pvanhout <pierre.vanhoutryve at amd.com>
Date: Wed, 17 Jan 2024 11:20:09 +0100
Subject: [PATCH 1/4] [RFC] Memory Model Relaxation Annotations
Work-in-progress patch to implement the core/target-agnostic components of Memory Model Relaxation Annotations.
This diff is mostly complete but likely has a few holes, especially in codegen (MMRAs aren't always carried over to MI layer I believe).
Most of the work needs to be done in adding tests and analyzing the outputs to find what's missing.
---
.../docs/MemoryModelRelaxationAnnotations.rst | 480 ++++++++++++++++++
llvm/docs/Reference.rst | 4 +
llvm/include/llvm/Analysis/VectorUtils.h | 2 +-
.../CodeGen/GlobalISel/MachineIRBuilder.h | 9 +
llvm/include/llvm/CodeGen/MachineFunction.h | 3 +-
llvm/include/llvm/CodeGen/MachineInstr.h | 43 +-
.../llvm/CodeGen/MachineInstrBuilder.h | 45 +-
llvm/include/llvm/CodeGen/SelectionDAG.h | 11 +
llvm/include/llvm/IR/FixedMetadataKinds.def | 1 +
llvm/include/llvm/IR/IRBuilder.h | 5 +-
.../IR/MemoryModelRelaxationAnnotations.h | 109 ++++
llvm/lib/Analysis/VectorUtils.cpp | 11 +-
llvm/lib/CodeGen/AtomicExpandPass.cpp | 2 +-
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | 1 +
.../CodeGen/GlobalISel/MachineIRBuilder.cpp | 4 +-
llvm/lib/CodeGen/MIRPrinter.cpp | 7 +
llvm/lib/CodeGen/MachineFunction.cpp | 4 +-
llvm/lib/CodeGen/MachineInstr.cpp | 49 +-
.../SelectionDAG/ScheduleDAGSDNodes.cpp | 9 +
.../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 2 +-
.../SelectionDAG/SelectionDAGBuilder.cpp | 17 +-
.../SelectionDAG/SelectionDAGDumper.cpp | 7 +
.../CodeGen/SelectionDAG/SelectionDAGISel.cpp | 2 +
llvm/lib/IR/CMakeLists.txt | 1 +
llvm/lib/IR/IRBuilder.cpp | 9 +
llvm/lib/IR/Instruction.cpp | 4 +-
.../IR/MemoryModelRelaxationAnnotations.cpp | 186 +++++++
llvm/lib/IR/Verifier.cpp | 31 ++
.../Scalar/MergedLoadStoreMotion.cpp | 2 +-
llvm/lib/Transforms/Utils/Local.cpp | 17 +-
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 10 +-
llvm/test/CodeGen/AMDGPU/GlobalISel/mmra.ll | 34 ++
llvm/test/CodeGen/AMDGPU/mmra.ll | 189 +++++++
.../AtomicExpand/AMDGPU/expand-atomic-mmra.ll | 204 ++++++++
llvm/test/Transforms/SimplifyCFG/mmra.ll | 150 ++++++
llvm/test/Verifier/mmra-allowed.ll | 31 ++
llvm/test/Verifier/mmra.ll | 43 ++
llvm/unittests/CodeGen/MachineInstrTest.cpp | 52 ++
llvm/unittests/IR/CMakeLists.txt | 1 +
.../MemoryModelRelaxationAnnotationsTest.cpp | 285 +++++++++++
40 files changed, 2017 insertions(+), 59 deletions(-)
create mode 100644 llvm/docs/MemoryModelRelaxationAnnotations.rst
create mode 100644 llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h
create mode 100644 llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp
create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/mmra.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/mmra.ll
create mode 100644 llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-mmra.ll
create mode 100644 llvm/test/Transforms/SimplifyCFG/mmra.ll
create mode 100644 llvm/test/Verifier/mmra-allowed.ll
create mode 100644 llvm/test/Verifier/mmra.ll
create mode 100644 llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp
diff --git a/llvm/docs/MemoryModelRelaxationAnnotations.rst b/llvm/docs/MemoryModelRelaxationAnnotations.rst
new file mode 100644
index 00000000000000..1b90f9cf9f2578
--- /dev/null
+++ b/llvm/docs/MemoryModelRelaxationAnnotations.rst
@@ -0,0 +1,480 @@
+===================================
+Memory Model Relaxation Annotations
+===================================
+
+.. contents::
+ :local:
+
+Introduction
+============
+
+Memory Model Relaxation Annotations (MMRAs) are target-defined properties
+on instructions that can be used to selectively relax constraints placed
+by the memory model. For example:
+
+* The use of ``VulkanMemoryModel`` in a SPIRV program allows certain
+ memory operations to be reordered across ``acquire`` or ``release``
+ operations.
+* OpenCL APIs expose primitives to only fence a specific set of address
+ spaces, carrying that information to the backend can enable the
+ use of faster synchronization instructions, rather than fencing all
+ address spaces.
+
+MMRAs offer an opt-in system for targets to relax the default LLVM
+memory model.
+As such, they are attached to an operation using LLVM metadata which
+can always be dropped without affecting correctness.
+
+Definitions
+===========
+
+memory operation
+ A load, a store, an atomic, or a function call that is marked as
+ accessing memory.
+
+synchronizing operation
+ An instruction that synchronizes memory with other threads (e.g.
+ an atomic or a fence).
+
+tag
+ Metadata attached to a memory or synchronizing operation
+ that represents some target-defined property regarding memory
+ synchronization.
+
+ An operation may have multiple tags that each represent a different
+ property.
+
+ A tag is composed of a pair of metadata string: a *prefix and a *suffix*.
+
+ In LLVM IR, the pair is represented using a metadata tuple.
+ In other cases (comments, documentation, etc.), we may use the
+ ``prefix:suffix`` notation.
+ For example:
+
+ .. code-block::
+ :caption: Example: Tags in Metadata
+
+ !0 = !{!"scope", !"workgroup"} # scope:workgroup
+ !1 = !{!"scope", !"device"} # scope:device
+ !2 = !{!"scope", !"system"} # scope:system
+
+ .. note::
+
+ The only semantics relevant to the optimizer is the
+ "compatibility" relation defined below. All other
+ semantics are target defined.
+
+ Tags can also be organised in lists to allow operations
+ to specify all of the tags they belong to. Such a list
+ is referred to as a "set of tags".
+
+ .. code-block::
+ :caption: Example: Set of Tags in Metadata
+
+ !0 = !{!"scope", !"workgroup"}
+ !1 = !{!"sync-as", !"private"}
+ !2 = !{!0, !2}
+
+ .. note::
+
+ If an operation does not have MMRA metadata, it's treated as if
+ it has an empty list (``!{}``) of tags.
+
+ Note that it is not an error if a tag is not recognized by the
+ instruction it is applied to, or by the current target.
+ Such tags are simply ignored.
+
+ Both synchronizing operations and memory operations can have
+ zero or more tags attached to them using the ``!mmra`` syntax.
+
+ For the sake of readability in examples below,
+ we use a (non-functional) short syntax to represent MMMRA metadata:
+
+ .. code-block::
+ :caption: Short Syntax Example
+
+ store %ptr1 # foo:bar
+ store %ptr1 !mmra !{!"foo", !"bar"}
+
+ These two notations can be used in this document and are strictly
+ equivalent. However, only the second version is functional.
+
+compatibility
+ Two sets of tags are said to be *compatible* iff, for every unique
+ tag prefix P present in at least one set:
+
+ - the other set contains no tag with prefix P, or
+ - at least one tag with prefix P is common to both sets.
+
+ The above definition implies that an empty set is always compatible
+ with any other set. This is an important property as it ensures that
+ if a transform drops the metadata on an operation, it can never affect
+ correctness. In other words, the memory model cannot be relaxed further
+ by deleting metadata from instructions.
+
+.. _HappensBefore:
+
+The *happens-before* Relation
+==============================
+
+Compatibility checks can be used to opt out of the *happens-before* relation
+established between two instructions.
+
+Ordering
+ When two instructions' metadata are not compatible, any program order
+ between them are not in *happens-before*.
+
+ For example, consider two tags ``foo:bar`` and
+ ``foo:baz`` exposed by a target:
+
+ .. code-block::
+
+ A: store %ptr1 # foo:bar
+ B: store %ptr2 # foo:baz
+ X: store atomic release %ptr3 # foo:bar
+
+ In the above figure, ``A`` is compatible with ``X``, and hence ``A``
+ happens-before ``X``. But ``B`` is not compatible with
+ ``X``, and hence it is not happens-before ``X``.
+
+Synchronization
+ If an synchronizing operation has one or more tags, then whether it
+ synchronizes-with and participates in the ``seq_cst`` order with
+ other operations is target dependent.
+
+ .. code-block::
+
+ ; Depending on the semantics of foo:bar & foo:bux, this may not
+ ; synchronize with another sequence.
+ fence release # foo:bar
+ store atomic %ptr1 # foo:bux
+
+Examples
+--------
+
+.. code-block:: text
+ :caption: Example 1
+
+ A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
+ B: store atomic release ptr addrspace(1) %ptr3 # sync-as:0 vulkan:nonprivate
+
+A and B are not ordered relative to each other
+(no *happens-before*) because their sets of tags are not compatible.
+
+Note that the ``sync-as`` value does not have to match the ``addrspace`` value.
+e.g. In Example 1, a store-release to a location in ``addrspace(1)`` wants to
+only synchronize with operations happening in ``addrspace(0)``.
+
+.. code-block:: text
+ :caption: Example 2
+
+ A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
+ B: store atomic release ptr addrspace(1) %ptr3 # sync-as:1 vulkan:nonprivate
+
+The ordering of A and B is unaffected because their set of tags are
+compatible.
+
+Note that A and B may or may not be in *happens-before* due to other reasons.
+
+.. code-block:: text
+ :caption: Example 3
+
+ A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
+ B: store atomic release ptr addrspace(1) %ptr3 # vulkan:nonprivate
+
+The ordering of A and B is unaffected because their set of tags are
+compatible.
+
+.. code-block:: text
+ :caption: Example 3
+
+ A: store ptr addrspace(1) %ptr2 # sync-as:1
+ B: store atomic release ptr addrspace(1) %ptr3 # sync-as:2
+
+A and B do not have to be ordered relative to each other
+(no *happens-before*) because their sets of tags are not compatible.
+
+Use-cases
+=========
+
+SPIRV ``NonPrivatePointer``
+---------------------------
+
+MMRAs can support the SPIRV capability
+``VulkanMemoryModel``, where synchronizing operations only affect
+memory operations that specify ``NonPrivatePointer`` semantics.
+
+The example below is generated from a SPIRV program using the
+following recipe:
+
+- Add ``vulkan:nonprivate`` to every synchronizing operation.
+- Add ``vulkan:nonprivate`` to every non-atomic memory operation
+ that is marked ``NonPrivatePointer``.
+- Add ``vulkan:private`` to tags of every non-atomic memory operation
+ that is not marked ``NonPrivatePointer``.
+
+.. code-block::
+
+ Thread T1:
+ A: store %ptr1 # vulkan:nonprivate
+ B: store %ptr2 # vulkan:private
+ X: store atomic release %ptr3 # vulkan:nonprivate
+
+ Thread T2:
+ Y: load atomic acquire %ptr3 # vulkan:nonprivate
+ C: load %ptr2 # vulkan:private
+ D: load %ptr1 # vulkan:nonprivate
+
+Compatibility ensures that operation ``A`` is ordered
+relative to ``X`` while operation ``D`` is ordered relative to ``Y``.
+If ``X`` synchronizes with ``Y``, then ``A`` happens-before ``D``.
+No such relation can be inferred about operations ``B`` and ``C``.
+
+.. note::
+ The `Vulkan Memory Model <https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-model-non-private>`_
+ considers all atomic operation non-private.
+
+ Whether ``vulkan:nonprivate`` would be specified on atomic operations is
+ an implementation detail, as an atomic operation is always ``nonprivate``.
+ The implementation may choose to be explicit and emit IR with
+ ``vulkan:nonprivate`` on every atomic operation, or it could choose to
+ only emit ``vulkan::private`` and assume ``vulkan:nonprivate``
+ by default.
+
+Operations marked with ``vulkan:private`` effectively opt out of the
+happens-before order in a SPIRV program since they are incompatible
+with every synchronizing operation. Note that SPIRV operations that
+are not marked ``NonPrivatePointer`` are not entirely private to the
+thread --- they are implicitly synchronized at the start or end of a
+thread by the Vulkan *system-synchronizes-with* relationship. This
+example assumes that the target-defined semantics of
+``vulkan:private`` correctly implements this property.
+
+This scheme is general enough to express the interoperability of SPIRV
+programs with other environments.
+
+.. code-block::
+
+ Thread T1:
+ A: store %ptr1 # vulkan:nonprivate
+ X: store atomic release %ptr2 # vulkan:nonprivate
+
+ Thread T2:
+ Y: load atomic acquire %ptr2 # foo:bar
+ B: load %ptr1
+
+In the above example, thread ``T1`` originates from a SPIRV program
+while thread ``T2`` originates from a non-SPIRV program. Whether ``X``
+can synchronize with ``Y`` is target defined. If ``X`` synchronizes
+with ``Y``, then ``A`` happens before ``B`` (because A/X and
+Y/B are compatible).
+
+Implementation Example
+~~~~~~~~~~~~~~~~~~~~~~
+
+Consider the implementation of SPIRV ``NonPrivatePointer`` on a target
+where all memory operations are cached, and the entire cache is
+flushed or invalidated at a ``release`` or ``acquire`` respectively. A
+possible scheme is that when translating a SPIRV program, memory
+operations marked ``NonPrivatePointer`` should not be cached, and the
+cache contents should not be touched during an ``acquire`` and
+``release`` operation.
+
+This could be implemented using the tags that share the ``vulkan:`` prefix,
+as follows:
+
+- For memory operations:
+
+ - Operations with ``vulkan:nonprivate`` should bypass the cache.
+ - Operations with ``vulkan:private`` should be cached.
+ - Operations that specify neither or both should conservatively
+ bypass the cache to ensure correctness.
+
+- For synchronizing operations:
+
+ - Operations with ``vulkan:nonprivate`` should not flush or
+ invalidate the cache.
+ - Operations with ``vulkan:private`` should flush or invalidate the cache.
+ - Operations that specify neither or both should conservatively
+ flush or invalidate the cache to ensure correctness.
+
+.. note::
+ In such an implementation, dropping the metadata on an operation, while
+ not affecting correctness, may have big performance implications.
+ e.g. an operation bypasses the cache when it shouldn't.
+
+Memory Types
+------------
+
+MMRAs may express the selective synchronization of
+different memory types.
+
+As an example, a target may expose an ``sync-as:<N>`` tag to
+pass information about which address spaces are synchronized by the
+execution of a synchronizing operation.
+
+.. note::
+ Address spaces are used here as a common example, but this concept isn't
+ can apply for other "memory types". What "memory types" means here is
+ up to the target.
+
+.. code-block::
+
+ # let 1 = global address space
+ # let 3 = local address space
+
+ Thread T1:
+ A: store %ptr1 # sync-as:1
+ B: store %ptr2 # sync-as:3
+ X: store atomic release ptr addrspace(0) %ptr3 # sync-as:3
+
+ Thread T2:
+ Y: load atomic acquire ptr addrspace(0) %ptr3 # sync-as:3
+ C: load %ptr2 # sync-as:3
+ D: load %ptr1 # sync-as:1
+
+In the above figure, ``X`` and ``Y`` are atomic operations on a
+location in the ``global`` address space. If ``X`` synchronizes with
+``Y``, then ``B`` happens-before ``C`` in the ``local`` address
+space. But no such statement can be made about operations ``A`` and
+``D``, although they are peformed on a location in the ``global``
+address space.
+
+Implementation Example: Adding Address Space Information to Fences
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Languages such as OpenCL C provide fence operations such as
+``atomic_work_item_fence`` that can take an explicit address
+space to fence.
+
+By default, LLVM has no means to carry that information in the IR, so
+the information is lost during lowering to LLVM IR. This means that
+targets such as AMDGPU have to conservatively emit instructions to
+fence all address spaces in all cases, which can have a noticeable
+performance impact in high-performance applications.
+
+MMRAs may be used to preserve that information at the IR level, all the
+way through code generation. For example, a fence that only affects the
+global address space ``addrspace(1)`` may be lowered as
+
+.. code-block::
+
+ fence release # sync-as:1
+
+and the target may use the presence of ``sync-as:1`` to infer that it
+must only emit instruction to fence the global address space.
+
+Note that as MMRAs are opt in, a fence that does not have MMRA metadata
+could still be lowered conservatively, so this optimization would only
+apply if the front-end emits the MMRA metadata on the fence instructions.
+
+Additional Topics
+=================
+
+.. note::
+
+ The following sections are informational.
+
+Performance Impact
+------------------
+
+MMRAs are a way to capture optimization opportunities in the program.
+But when an operation mentions no tags or conflicting tags,
+the target may need to produce conservative code to ensure correctness
+at the cost of performance. This can happen in the following situations:
+
+1. When a target first introduces MMRAs, the
+ frontend might not have been updated to emit them.
+2. An optimization may drop MMRA metadata.
+3. An optimization may add arbitrary tags to an operation.
+
+Note that targets can always choose to ignore (or even drop) MMRAs
+and revert to the default behavior/codegen heuristics without
+affecting correctness.
+
+Consequences of the Absence of *happens-before*
+-----------------------------------------------
+
+In the :ref:`happens-before<HappensBefore>` section, we defined how an
+*happens-before* relation between two instruction can be broken
+by leveraging compatibility between MMRAs. When the instructions
+are incompatible and there is no *happens-before* relation, we say
+that the instructions "do not have to be ordered relative to each
+other".
+
+"Ordering" in this context is a very broad term which covers both
+static and runtime aspects.
+
+When there is no ordering constraint, we *could* statically reorder
+the instructions in an optimizer transform if the reordering does
+not break other constraints as single location coherence.
+Static reordering is one consequence of breaking *happens-before*,
+but is not the most interesting one.
+
+Run-time consequences are more interesting. When there is an
+*happens-before* relation between instructions, the target has to emit
+synchronization code to ensure other threads will observe the effects of
+the instructions in the right order.
+
+For instance, the target may have to wait for previous loads & stores to
+finish before starting a fence-release, or there may be a need to flush a
+memory cache before executing the next instruction.
+In the absence of *happens-before*, there is no such requirement and
+no waiting or flushing is required. This may noticeably speed up
+execution in some cases.
+
+Combining Operations
+--------------------
+
+If a pass can combine multiple memory or synchronizing operations
+into one, then the metadata of the new instruction(s) shall be a
+prefix-wise union of the metadata of the source instructions.
+
+Let A and B be two tags set, and U be the prefix-wise union of A and B.
+For every unique tag prefix P present in A or B:
+
+* If either A or B has no tags with prefix P, no tags with prefix
+ P are added to U.
+* If both A and B have at least one tag with prefix P, all tags with prefix
+ P from both sets are added to U.
+
+Passes should avoid aggressively combining MMRAs, as this can result
+in significant losses of information. While this cannot affect
+correctness, it may affect performance.
+
+As a general rule of thumb, common passes such as SimplifyCFG that
+aggressively combine/reorder operations should only combine
+instructions that have identical sets of tags.
+Passes that combine less frequently, or that are well aware of the cost
+of combining the MMRAs can use the prefix-wise union described above.
+
+Examples:
+
+.. code-block::
+
+ A: store release %ptr1 # foo:x, foo:y, bar:x
+ B: store release %ptr2 # foo:x, bar:y
+
+ # Unique prefixes P = [foo, bar]
+ # "foo:x" is common to A and B so it's added to U.
+ # "bar:x" != "bar:y" so it's not added to U.
+ U: store release %ptr3 # foo:x
+
+.. code-block::
+
+ A: store release %ptr1 # foo:x, foo:y
+ B: store release %ptr2 # foo:x, bux:y
+
+ # Unique prefixes P = [foo, bux]
+ # "foo:x" is common to A and B so it's added to U.
+ # No tags have the prefix "bux" in A.
+ U: store release %ptr3 # foo:x
+
+.. code-block::
+
+ A: store release %ptr1
+ B: store release %ptr2 # foo:x, bar:y
+
+ # Unique prefixes P = [foo, bar]
+ # No tags with "foo" or "bar" in A, so no tags added.
+ U: store release %ptr3
diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
index 3a1d1665be439e..1661c8c533db1d 100644
--- a/llvm/docs/Reference.rst
+++ b/llvm/docs/Reference.rst
@@ -39,6 +39,7 @@ LLVM and API reference documentation.
PDB/index
PointerAuth
ScudoHardenedAllocator
+ MemoryModelRelaxationAnnotations
MemTagSanitizer
Security
SecurityTransparencyReports
@@ -194,6 +195,9 @@ Additional Topics
:doc:`ScudoHardenedAllocator`
A library that implements a security-hardened `malloc()`.
+:doc:`MemoryModelRelaxationAnnotations`
+ Target-defined relaxation to LLVM's concurrency model.
+
:doc:`MemTagSanitizer`
Security hardening for production code aiming to mitigate memory
related vulnerabilities. Based on the Armv8.5-A Memory Tagging Extension.
diff --git a/llvm/include/llvm/Analysis/VectorUtils.h b/llvm/include/llvm/Analysis/VectorUtils.h
index c6eb66cc9660ca..424b73e375b5bc 100644
--- a/llvm/include/llvm/Analysis/VectorUtils.h
+++ b/llvm/include/llvm/Analysis/VectorUtils.h
@@ -301,7 +301,7 @@ MDNode *intersectAccessGroups(const Instruction *Inst1,
const Instruction *Inst2);
/// Specifically, let Kinds = [MD_tbaa, MD_alias_scope, MD_noalias, MD_fpmath,
-/// MD_nontemporal, MD_access_group].
+/// MD_nontemporal, MD_access_group, MD_mmra].
/// For K in Kinds, we get the MDNode for K from each of the
/// elements of VL, compute their "intersection" (i.e., the most generic
/// metadata value that covers all of the individual values), and set I's
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
index 4c9d85fd9f5140..760b41a9f7beb2 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
@@ -52,6 +52,8 @@ struct MachineIRBuilderState {
DebugLoc DL;
/// PC sections metadata to be set to any instruction we create.
MDNode *PCSections = nullptr;
+ /// MMRA Metadata to be set on any instruction we create.
+ MDNode *MMRA = nullptr;
/// \name Fields describing the insertion point.
/// @{
@@ -353,6 +355,7 @@ class MachineIRBuilder {
setMBB(*MI.getParent());
State.II = MI.getIterator();
setPCSections(MI.getPCSections());
+ setMMRAMetadata(MI.getMMRAMetadata());
}
/// @}
@@ -386,6 +389,12 @@ class MachineIRBuilder {
/// Get the current instruction's PC sections metadata.
MDNode *getPCSections() { return State.PCSections; }
+ /// Set the PC sections metadata to \p MD for all the next build instructions.
+ void setMMRAMetadata(MDNode *MMRA) { State.MMRA = MMRA; }
+
+ /// Get the current instruction's MMRA metadata.
+ MDNode *getMMRAMetadata() { return State.MMRA; }
+
/// Build and insert <empty> = \p Opcode <empty>.
/// The insertion point is the one set by the last call of either
/// setBasicBlock or setMI.
diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h b/llvm/include/llvm/CodeGen/MachineFunction.h
index c2bff279449398..f64e1217de320b 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -1123,7 +1123,8 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction {
MachineInstr::ExtraInfo *createMIExtraInfo(
ArrayRef<MachineMemOperand *> MMOs, MCSymbol *PreInstrSymbol = nullptr,
MCSymbol *PostInstrSymbol = nullptr, MDNode *HeapAllocMarker = nullptr,
- MDNode *PCSections = nullptr, uint32_t CFIType = 0);
+ MDNode *PCSections = nullptr, uint32_t CFIType = 0,
+ MDNode *MMRAs = nullptr);
/// Allocate a string and populate it with the given external symbol name.
const char *createExternalSymbolName(StringRef Name);
diff --git a/llvm/include/llvm/CodeGen/MachineInstr.h b/llvm/include/llvm/CodeGen/MachineInstr.h
index 7249f812d2cc4c..78529990363cfb 100644
--- a/llvm/include/llvm/CodeGen/MachineInstr.h
+++ b/llvm/include/llvm/CodeGen/MachineInstr.h
@@ -160,37 +160,41 @@ class MachineInstr
MCSymbol *PreInstrSymbol = nullptr,
MCSymbol *PostInstrSymbol = nullptr,
MDNode *HeapAllocMarker = nullptr,
- MDNode *PCSections = nullptr,
- uint32_t CFIType = 0) {
+ MDNode *PCSections = nullptr, uint32_t CFIType = 0,
+ MDNode *MMRAs = nullptr) {
bool HasPreInstrSymbol = PreInstrSymbol != nullptr;
bool HasPostInstrSymbol = PostInstrSymbol != nullptr;
bool HasHeapAllocMarker = HeapAllocMarker != nullptr;
+ bool HasMMRAs = MMRAs != nullptr;
bool HasCFIType = CFIType != 0;
bool HasPCSections = PCSections != nullptr;
auto *Result = new (Allocator.Allocate(
totalSizeToAlloc<MachineMemOperand *, MCSymbol *, MDNode *, uint32_t>(
MMOs.size(), HasPreInstrSymbol + HasPostInstrSymbol,
- HasHeapAllocMarker + HasPCSections, HasCFIType),
+ HasHeapAllocMarker + HasPCSections + HasMMRAs, HasCFIType),
alignof(ExtraInfo)))
ExtraInfo(MMOs.size(), HasPreInstrSymbol, HasPostInstrSymbol,
- HasHeapAllocMarker, HasPCSections, HasCFIType);
+ HasHeapAllocMarker, HasPCSections, HasCFIType, HasMMRAs);
// Copy the actual data into the trailing objects.
std::copy(MMOs.begin(), MMOs.end(),
Result->getTrailingObjects<MachineMemOperand *>());
+ unsigned MDNodeIdx = 0;
+
if (HasPreInstrSymbol)
Result->getTrailingObjects<MCSymbol *>()[0] = PreInstrSymbol;
if (HasPostInstrSymbol)
Result->getTrailingObjects<MCSymbol *>()[HasPreInstrSymbol] =
PostInstrSymbol;
if (HasHeapAllocMarker)
- Result->getTrailingObjects<MDNode *>()[0] = HeapAllocMarker;
+ Result->getTrailingObjects<MDNode *>()[MDNodeIdx++] = HeapAllocMarker;
if (HasPCSections)
- Result->getTrailingObjects<MDNode *>()[HasHeapAllocMarker] =
- PCSections;
+ Result->getTrailingObjects<MDNode *>()[MDNodeIdx++] = PCSections;
if (HasCFIType)
Result->getTrailingObjects<uint32_t>()[0] = CFIType;
+ if (HasMMRAs)
+ Result->getTrailingObjects<MDNode *>()[MDNodeIdx++] = MMRAs;
return Result;
}
@@ -223,6 +227,12 @@ class MachineInstr
return HasCFIType ? getTrailingObjects<uint32_t>()[0] : 0;
}
+ MDNode *getMMRAMetadata() const {
+ return HasMMRAs ? getTrailingObjects<MDNode *>()[HasHeapAllocMarker +
+ HasPCSections]
+ : nullptr;
+ }
+
private:
friend TrailingObjects;
@@ -237,6 +247,7 @@ class MachineInstr
const bool HasHeapAllocMarker;
const bool HasPCSections;
const bool HasCFIType;
+ const bool HasMMRAs;
// Implement the `TrailingObjects` internal API.
size_t numTrailingObjects(OverloadToken<MachineMemOperand *>) const {
@@ -255,11 +266,12 @@ class MachineInstr
// Just a boring constructor to allow us to initialize the sizes. Always use
// the `create` routine above.
ExtraInfo(int NumMMOs, bool HasPreInstrSymbol, bool HasPostInstrSymbol,
- bool HasHeapAllocMarker, bool HasPCSections, bool HasCFIType)
+ bool HasHeapAllocMarker, bool HasPCSections, bool HasCFIType,
+ bool HasMMRAs)
: NumMMOs(NumMMOs), HasPreInstrSymbol(HasPreInstrSymbol),
HasPostInstrSymbol(HasPostInstrSymbol),
HasHeapAllocMarker(HasHeapAllocMarker), HasPCSections(HasPCSections),
- HasCFIType(HasCFIType) {}
+ HasCFIType(HasCFIType), HasMMRAs(HasMMRAs) {}
};
/// Enumeration of the kinds of inline extra info available. It is important
@@ -838,6 +850,15 @@ class MachineInstr
return nullptr;
}
+ /// Helper to extract mmra.op metadata.
+ MDNode *getMMRAMetadata() const {
+ if (!Info)
+ return nullptr;
+ if (ExtraInfo *EI = Info.get<EIIK_OutOfLine>())
+ return EI->getMMRAMetadata();
+ return nullptr;
+ }
+
/// Helper to extract a CFI type hash if one has been added.
uint32_t getCFIType() const {
if (!Info)
@@ -1902,6 +1923,8 @@ class MachineInstr
// addresses into.
void setPCSections(MachineFunction &MF, MDNode *MD);
+ void setMMRAMetadata(MachineFunction &MF, MDNode *MMRAs);
+
/// Set the CFI type for the instruction.
void setCFIType(MachineFunction &MF, uint32_t Type);
@@ -2014,7 +2037,7 @@ class MachineInstr
void setExtraInfo(MachineFunction &MF, ArrayRef<MachineMemOperand *> MMOs,
MCSymbol *PreInstrSymbol, MCSymbol *PostInstrSymbol,
MDNode *HeapAllocMarker, MDNode *PCSections,
- uint32_t CFIType);
+ uint32_t CFIType, MDNode *MMRAs);
};
/// Special DenseMapInfo traits to compare MachineInstr* by *value* of the
diff --git a/llvm/include/llvm/CodeGen/MachineInstrBuilder.h b/llvm/include/llvm/CodeGen/MachineInstrBuilder.h
index 954d8e6770a294..a5b8d3af3cc9b7 100644
--- a/llvm/include/llvm/CodeGen/MachineInstrBuilder.h
+++ b/llvm/include/llvm/CodeGen/MachineInstrBuilder.h
@@ -322,6 +322,12 @@ class MachineInstrBuilder {
return *this;
}
+ const MachineInstrBuilder &setMMRAMetadata(MDNode *MMRA) const {
+ if (MMRA)
+ MI->setMMRAMetadata(*MF, MMRA);
+ return *this;
+ }
+
/// Copy all the implicit operands from OtherMI onto this one.
const MachineInstrBuilder &
copyImplicitOps(const MachineInstr &OtherMI) const {
@@ -337,14 +343,15 @@ class MachineInstrBuilder {
};
/// Set of metadata that should be preserved when using BuildMI(). This provides
-/// a more convenient way of preserving DebugLoc and PCSections.
+/// a more convenient way of preserving DebugLoc, PCSections and MMRA.
class MIMetadata {
public:
MIMetadata() = default;
- MIMetadata(DebugLoc DL, MDNode *PCSections = nullptr)
- : DL(std::move(DL)), PCSections(PCSections) {}
- MIMetadata(const DILocation *DI, MDNode *PCSections = nullptr)
- : DL(DI), PCSections(PCSections) {}
+ MIMetadata(DebugLoc DL, MDNode *PCSections = nullptr, MDNode *MMRA = nullptr)
+ : DL(std::move(DL)), PCSections(PCSections), MMRA(MMRA) {}
+ MIMetadata(const DILocation *DI, MDNode *PCSections = nullptr,
+ MDNode *MMRA = nullptr)
+ : DL(DI), PCSections(PCSections), MMRA(MMRA) {}
explicit MIMetadata(const Instruction &From)
: DL(From.getDebugLoc()),
PCSections(From.getMetadata(LLVMContext::MD_pcsections)) {}
@@ -353,17 +360,20 @@ class MIMetadata {
const DebugLoc &getDL() const { return DL; }
MDNode *getPCSections() const { return PCSections; }
+ MDNode *getMMRAMetadata() const { return MMRA; }
private:
DebugLoc DL;
MDNode *PCSections = nullptr;
+ MDNode *MMRA = nullptr;
};
/// Builder interface. Specify how to create the initial instruction itself.
inline MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD,
const MCInstrDesc &MCID) {
return MachineInstrBuilder(MF, MF.CreateMachineInstr(MCID, MIMD.getDL()))
- .setPCSections(MIMD.getPCSections());
+ .setPCSections(MIMD.getPCSections())
+ .setMMRAMetadata(MIMD.getMMRAMetadata());
}
/// This version of the builder sets up the first operand as a
@@ -371,8 +381,9 @@ inline MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD,
inline MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD,
const MCInstrDesc &MCID, Register DestReg) {
return MachineInstrBuilder(MF, MF.CreateMachineInstr(MCID, MIMD.getDL()))
- .setPCSections(MIMD.getPCSections())
- .addReg(DestReg, RegState::Define);
+ .setPCSections(MIMD.getPCSections())
+ .setMMRAMetadata(MIMD.getMMRAMetadata())
+ .addReg(DestReg, RegState::Define);
}
/// This version of the builder inserts the newly-built instruction before
@@ -386,8 +397,9 @@ inline MachineInstrBuilder BuildMI(MachineBasicBlock &BB,
MachineInstr *MI = MF.CreateMachineInstr(MCID, MIMD.getDL());
BB.insert(I, MI);
return MachineInstrBuilder(MF, MI)
- .setPCSections(MIMD.getPCSections())
- .addReg(DestReg, RegState::Define);
+ .setPCSections(MIMD.getPCSections())
+ .setMMRAMetadata(MIMD.getMMRAMetadata())
+ .addReg(DestReg, RegState::Define);
}
/// This version of the builder inserts the newly-built instruction before
@@ -404,8 +416,9 @@ inline MachineInstrBuilder BuildMI(MachineBasicBlock &BB,
MachineInstr *MI = MF.CreateMachineInstr(MCID, MIMD.getDL());
BB.insert(I, MI);
return MachineInstrBuilder(MF, MI)
- .setPCSections(MIMD.getPCSections())
- .addReg(DestReg, RegState::Define);
+ .setPCSections(MIMD.getPCSections())
+ .setMMRAMetadata(MIMD.getMMRAMetadata())
+ .addReg(DestReg, RegState::Define);
}
inline MachineInstrBuilder BuildMI(MachineBasicBlock &BB, MachineInstr &I,
@@ -435,7 +448,9 @@ inline MachineInstrBuilder BuildMI(MachineBasicBlock &BB,
MachineFunction &MF = *BB.getParent();
MachineInstr *MI = MF.CreateMachineInstr(MCID, MIMD.getDL());
BB.insert(I, MI);
- return MachineInstrBuilder(MF, MI).setPCSections(MIMD.getPCSections());
+ return MachineInstrBuilder(MF, MI)
+ .setPCSections(MIMD.getPCSections())
+ .setMMRAMetadata(MIMD.getMMRAMetadata());
}
inline MachineInstrBuilder BuildMI(MachineBasicBlock &BB,
@@ -445,7 +460,9 @@ inline MachineInstrBuilder BuildMI(MachineBasicBlock &BB,
MachineFunction &MF = *BB.getParent();
MachineInstr *MI = MF.CreateMachineInstr(MCID, MIMD.getDL());
BB.insert(I, MI);
- return MachineInstrBuilder(MF, MI).setPCSections(MIMD.getPCSections());
+ return MachineInstrBuilder(MF, MI)
+ .setPCSections(MIMD.getPCSections())
+ .setMMRAMetadata(MIMD.getMMRAMetadata());
}
inline MachineInstrBuilder BuildMI(MachineBasicBlock &BB, MachineInstr &I,
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index 574c63552ce0ae..8237508424ecf0 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -285,6 +285,7 @@ class SelectionDAG {
CallSiteInfo CSInfo;
MDNode *HeapAllocSite = nullptr;
MDNode *PCSections = nullptr;
+ MDNode *MMRA = nullptr;
bool NoMerge = false;
};
/// Out-of-line extra information for SDNodes.
@@ -2280,11 +2281,21 @@ class SelectionDAG {
void addPCSections(const SDNode *Node, MDNode *MD) {
SDEI[Node].PCSections = MD;
}
+ /// Set MMRAMetadata to be associated with Node.
+ void addMMRAMetadata(const SDNode *Node, MDNode *MMRA) {
+ SDEI[Node].MMRA = MMRA;
+ }
/// Return PCSections associated with Node, or nullptr if none exists.
MDNode *getPCSections(const SDNode *Node) const {
auto It = SDEI.find(Node);
return It != SDEI.end() ? It->second.PCSections : nullptr;
}
+ /// Return the MMRA MDNode associated with Node, or nullptr if none
+ /// exists.
+ MDNode *getMMRAMetadata(const SDNode *Node) const {
+ auto It = SDEI.find(Node);
+ return It != SDEI.end() ? It->second.MMRA : nullptr;
+ }
/// Set NoMergeSiteInfo to be associated with Node if NoMerge is true.
void addNoMergeSiteInfo(const SDNode *Node, bool NoMerge) {
if (NoMerge)
diff --git a/llvm/include/llvm/IR/FixedMetadataKinds.def b/llvm/include/llvm/IR/FixedMetadataKinds.def
index b375d0f0912060..5f4cc230a0f5ff 100644
--- a/llvm/include/llvm/IR/FixedMetadataKinds.def
+++ b/llvm/include/llvm/IR/FixedMetadataKinds.def
@@ -51,3 +51,4 @@ LLVM_FIXED_MD_KIND(MD_kcfi_type, "kcfi_type", 36)
LLVM_FIXED_MD_KIND(MD_pcsections, "pcsections", 37)
LLVM_FIXED_MD_KIND(MD_DIAssignID, "DIAssignID", 38)
LLVM_FIXED_MD_KIND(MD_coro_outside_frame, "coro.outside.frame", 39)
+LLVM_FIXED_MD_KIND(MD_mmra, "mmra", 40)
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index 2e2ec9a1c83026..d7bc40bccd1a2b 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -244,10 +244,7 @@ class IRBuilderBase {
void SetInstDebugLocation(Instruction *I) const;
/// Add all entries in MetadataToCopy to \p I.
- void AddMetadataToInst(Instruction *I) const {
- for (const auto &KV : MetadataToCopy)
- I->setMetadata(KV.first, KV.second);
- }
+ void AddMetadataToInst(Instruction *I) const;
/// Get the return type of the current function that we're emitting
/// into.
diff --git a/llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h b/llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h
new file mode 100644
index 00000000000000..88824dd59518b5
--- /dev/null
+++ b/llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h
@@ -0,0 +1,109 @@
+//===- MemoryModelRelaxationAnnotations.h -----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// This file provides utility for Memory Model Relaxation Annotations (MMRAs).
+/// Those annotations are represented using Metadata. The MMRATagSet class
+/// offers a simple API to parse the metadata and perform common operations on
+/// it. The MMRAMetadata class is a simple tuple of MDNode that provides easy
+/// access to all MMRA annotations on an instruction.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_IR_MEMORYMODELRELAXATIONANNOTATIONS_H
+#define LLVM_IR_MEMORYMODELRELAXATIONANNOTATIONS_H
+
+#include <set>
+#include <string>
+#include <vector>
+
+namespace llvm {
+
+class MDNode;
+class MDTuple;
+class Metadata;
+class StringRef;
+class raw_ostream;
+class LLVMContext;
+class Instruction;
+
+/// Helper class for `!mmra` metadata nodes which can both build MMRA MDNodes,
+/// and parse them.
+///
+/// This can be visualized as a set of "tags", with each tag
+/// representing a particular property of an instruction, as
+/// explained in the MemoryModelRelaxationAnnotations docs.
+///
+/// This class (and the optimizer in general) does not reason
+/// about the exact nature of the tags and the properties they
+/// imply. It just sees the metadata as a collection of tags, which
+/// are a prefix/suffix pair of strings.
+class MMRAMetadata {
+public:
+ using TagT = std::pair<std::string, std::string>;
+ using SetT = std::set<TagT>;
+ using const_iterator = SetT::const_iterator;
+
+ MMRAMetadata() = default;
+ MMRAMetadata(const Instruction &I);
+ MMRAMetadata(MDNode *MD);
+
+ /// \returns whether the MMRAs on \p A and \p B are compatible.
+ static bool checkCompatibility(const Instruction &A, const Instruction &B) {
+ return MMRAMetadata(A).isCompatibleWith(B);
+ }
+
+ /// \returns true if \p MD is a well-formed MMRA tag.
+ static bool isTagMD(const Metadata *MD);
+
+ /// \returns whether this set of tags is compatible with \p Other.
+ bool isCompatibleWith(const MMRAMetadata &Other) const;
+
+ /// Combines this set of tags with \p Other.
+ /// \returns a new set of tags containing the result.
+ MMRAMetadata combine(const MMRAMetadata &Other) const;
+
+ MMRAMetadata &addTag(StringRef Prefix, StringRef Suffix);
+ MMRAMetadata &addTag(const TagT &Tag) {
+ Tags.insert(Tag);
+ return *this;
+ }
+
+ bool hasTag(StringRef Prefix, StringRef Suffix) const;
+ bool hasTag(const TagT &Tag) const { return Tags.count(Tag); }
+
+ bool hasTagWithPrefix(StringRef Prefix) const;
+
+ std::vector<TagT> getAllTagsWithPrefix(StringRef Prefix) const;
+
+ MDTuple *getAsMD(LLVMContext &Ctx) const;
+
+ const_iterator begin() const;
+ const_iterator end() const;
+ bool empty() const;
+ unsigned size() const;
+
+ void print(raw_ostream &OS) const;
+ void dump() const;
+
+ operator bool() const { return !Tags.empty(); }
+ bool operator==(const MMRAMetadata &Other) const {
+ return Tags == Other.Tags;
+ }
+ bool operator!=(const MMRAMetadata &Other) const {
+ return Tags != Other.Tags;
+ }
+
+private:
+ SetT Tags;
+};
+
+bool canInstructionHaveMMRAs(const Instruction &I);
+
+} // namespace llvm
+
+#endif
diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index bf7bc0ba84a033..a86be5608ae7b3 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -23,6 +23,7 @@
#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Value.h"
#include "llvm/Support/CommandLine.h"
@@ -793,13 +794,19 @@ Instruction *llvm::propagateMetadata(Instruction *Inst, ArrayRef<Value *> VL) {
for (auto Kind : {LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias, LLVMContext::MD_fpmath,
LLVMContext::MD_nontemporal, LLVMContext::MD_invariant_load,
- LLVMContext::MD_access_group}) {
+ LLVMContext::MD_access_group, LLVMContext::MD_mmra}) {
MDNode *MD = I0->getMetadata(Kind);
-
for (int J = 1, E = VL.size(); MD && J != E; ++J) {
const Instruction *IJ = cast<Instruction>(VL[J]);
MDNode *IMD = IJ->getMetadata(Kind);
+
switch (Kind) {
+ case LLVMContext::MD_mmra: {
+ auto Tags = MMRAMetadata(dyn_cast_or_null<MDTuple>(MD));
+ auto ITags = MMRAMetadata(dyn_cast_or_null<MDTuple>(IMD));
+ MD = Tags.combine(ITags).getAsMD(Inst->getContext());
+ break;
+ }
case LLVMContext::MD_tbaa:
MD = MDNode::getMostGenericTBAA(MD, IMD);
break;
diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index d5db79df68622a..730877b7abb8da 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -139,9 +139,9 @@ struct ReplacementIRBuilder : IRBuilder<InstSimplifyFolder> {
explicit ReplacementIRBuilder(Instruction *I, const DataLayout &DL)
: IRBuilder(I->getContext(), DL) {
SetInsertPoint(I);
- this->CollectMetadataToCopy(I, {LLVMContext::MD_pcsections});
if (BB->getParent()->getAttributes().hasFnAttr(Attribute::StrictFP))
this->setIsFPConstrained(true);
+ this->CollectMetadataToCopy(I, {LLVMContext::MD_pcsections, LLVMContext::MD_mmra});
}
};
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index 47e980e05281fc..5b2d1ea03937eb 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -3426,6 +3426,7 @@ void IRTranslator::translateDbgInfo(const Instruction &Inst,
bool IRTranslator::translate(const Instruction &Inst) {
CurBuilder->setDebugLoc(Inst.getDebugLoc());
CurBuilder->setPCSections(Inst.getMetadata(LLVMContext::MD_pcsections));
+ CurBuilder->setMMRAMetadata(Inst.getMetadata(LLVMContext::MD_mmra));
if (TLI->fallBackToDAGISel(Inst))
return false;
diff --git a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
index b8ba782254c370..e8d427356ffd01 100644
--- a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
@@ -28,6 +28,7 @@ void MachineIRBuilder::setMF(MachineFunction &MF) {
State.TII = MF.getSubtarget().getInstrInfo();
State.DL = DebugLoc();
State.PCSections = nullptr;
+ State.MMRA = nullptr;
State.II = MachineBasicBlock::iterator();
State.Observer = nullptr;
}
@@ -37,7 +38,8 @@ void MachineIRBuilder::setMF(MachineFunction &MF) {
//------------------------------------------------------------------------------
MachineInstrBuilder MachineIRBuilder::buildInstrNoInsert(unsigned Opcode) {
- return BuildMI(getMF(), {getDL(), getPCSections()}, getTII().get(Opcode));
+ return BuildMI(getMF(), {getDL(), getPCSections(), getMMRAMetadata()},
+ getTII().get(Opcode));
}
MachineInstrBuilder MachineIRBuilder::insertInstr(MachineInstrBuilder MIB) {
diff --git a/llvm/lib/CodeGen/MIRPrinter.cpp b/llvm/lib/CodeGen/MIRPrinter.cpp
index 4cf3074ea3ffaf..7bb4ce04e5d260 100644
--- a/llvm/lib/CodeGen/MIRPrinter.cpp
+++ b/llvm/lib/CodeGen/MIRPrinter.cpp
@@ -854,6 +854,13 @@ void MIPrinter::print(const MachineInstr &MI) {
PCSections->printAsOperand(OS, MST);
NeedComma = true;
}
+ if (MDNode *MMRA = MI.getMMRAMetadata()) {
+ if (NeedComma)
+ OS << ',';
+ OS << " mmra ";
+ MMRA->printAsOperand(OS, MST);
+ NeedComma = true;
+ }
if (uint32_t CFIType = MI.getCFIType()) {
if (NeedComma)
OS << ',';
diff --git a/llvm/lib/CodeGen/MachineFunction.cpp b/llvm/lib/CodeGen/MachineFunction.cpp
index ad532149926670..8366ad2859069f 100644
--- a/llvm/lib/CodeGen/MachineFunction.cpp
+++ b/llvm/lib/CodeGen/MachineFunction.cpp
@@ -573,10 +573,10 @@ MachineFunction::getMachineMemOperand(const MachineMemOperand *MMO,
MachineInstr::ExtraInfo *MachineFunction::createMIExtraInfo(
ArrayRef<MachineMemOperand *> MMOs, MCSymbol *PreInstrSymbol,
MCSymbol *PostInstrSymbol, MDNode *HeapAllocMarker, MDNode *PCSections,
- uint32_t CFIType) {
+ uint32_t CFIType, MDNode *MMRAs) {
return MachineInstr::ExtraInfo::create(Allocator, MMOs, PreInstrSymbol,
PostInstrSymbol, HeapAllocMarker,
- PCSections, CFIType);
+ PCSections, CFIType, MMRAs);
}
const char *MachineFunction::createExternalSymbolName(StringRef Name) {
diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 83604003a038bd..f377746e6c744d 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -318,14 +318,15 @@ void MachineInstr::setExtraInfo(MachineFunction &MF,
MCSymbol *PreInstrSymbol,
MCSymbol *PostInstrSymbol,
MDNode *HeapAllocMarker, MDNode *PCSections,
- uint32_t CFIType) {
+ uint32_t CFIType, MDNode *MMRAs) {
bool HasPreInstrSymbol = PreInstrSymbol != nullptr;
bool HasPostInstrSymbol = PostInstrSymbol != nullptr;
bool HasHeapAllocMarker = HeapAllocMarker != nullptr;
bool HasPCSections = PCSections != nullptr;
bool HasCFIType = CFIType != 0;
+ bool HasMMRAs = MMRAs != nullptr;
int NumPointers = MMOs.size() + HasPreInstrSymbol + HasPostInstrSymbol +
- HasHeapAllocMarker + HasPCSections + HasCFIType;
+ HasHeapAllocMarker + HasPCSections + HasCFIType + HasMMRAs;
// Drop all extra info if there is none.
if (NumPointers <= 0) {
@@ -337,11 +338,11 @@ void MachineInstr::setExtraInfo(MachineFunction &MF,
// out of line because PointerSumType cannot hold more than 4 tag types with
// 32-bit pointers.
// FIXME: Maybe we should make the symbols in the extra info mutable?
- else if (NumPointers > 1 || HasHeapAllocMarker || HasPCSections ||
+ else if (NumPointers > 1 || HasMMRAs || HasHeapAllocMarker || HasPCSections ||
HasCFIType) {
Info.set<EIIK_OutOfLine>(
MF.createMIExtraInfo(MMOs, PreInstrSymbol, PostInstrSymbol,
- HeapAllocMarker, PCSections, CFIType));
+ HeapAllocMarker, PCSections, CFIType, MMRAs));
return;
}
@@ -359,7 +360,8 @@ void MachineInstr::dropMemRefs(MachineFunction &MF) {
return;
setExtraInfo(MF, {}, getPreInstrSymbol(), getPostInstrSymbol(),
- getHeapAllocMarker(), getPCSections(), getCFIType());
+ getHeapAllocMarker(), getPCSections(), getCFIType(),
+ getMMRAMetadata());
}
void MachineInstr::setMemRefs(MachineFunction &MF,
@@ -370,7 +372,8 @@ void MachineInstr::setMemRefs(MachineFunction &MF,
}
setExtraInfo(MF, MMOs, getPreInstrSymbol(), getPostInstrSymbol(),
- getHeapAllocMarker(), getPCSections(), getCFIType());
+ getHeapAllocMarker(), getPCSections(), getCFIType(),
+ getMMRAMetadata());
}
void MachineInstr::addMemOperand(MachineFunction &MF,
@@ -394,7 +397,8 @@ void MachineInstr::cloneMemRefs(MachineFunction &MF, const MachineInstr &MI) {
if (getPreInstrSymbol() == MI.getPreInstrSymbol() &&
getPostInstrSymbol() == MI.getPostInstrSymbol() &&
getHeapAllocMarker() == MI.getHeapAllocMarker() &&
- getPCSections() == MI.getPCSections()) {
+ getPCSections() == MI.getPCSections() && getMMRAMetadata() &&
+ MI.getMMRAMetadata()) {
Info = MI.Info;
return;
}
@@ -479,7 +483,8 @@ void MachineInstr::setPreInstrSymbol(MachineFunction &MF, MCSymbol *Symbol) {
}
setExtraInfo(MF, memoperands(), Symbol, getPostInstrSymbol(),
- getHeapAllocMarker(), getPCSections(), getCFIType());
+ getHeapAllocMarker(), getPCSections(), getCFIType(),
+ getMMRAMetadata());
}
void MachineInstr::setPostInstrSymbol(MachineFunction &MF, MCSymbol *Symbol) {
@@ -494,7 +499,8 @@ void MachineInstr::setPostInstrSymbol(MachineFunction &MF, MCSymbol *Symbol) {
}
setExtraInfo(MF, memoperands(), getPreInstrSymbol(), Symbol,
- getHeapAllocMarker(), getPCSections(), getCFIType());
+ getHeapAllocMarker(), getPCSections(), getCFIType(),
+ getMMRAMetadata());
}
void MachineInstr::setHeapAllocMarker(MachineFunction &MF, MDNode *Marker) {
@@ -503,7 +509,7 @@ void MachineInstr::setHeapAllocMarker(MachineFunction &MF, MDNode *Marker) {
return;
setExtraInfo(MF, memoperands(), getPreInstrSymbol(), getPostInstrSymbol(),
- Marker, getPCSections(), getCFIType());
+ Marker, getPCSections(), getCFIType(), getMMRAMetadata());
}
void MachineInstr::setPCSections(MachineFunction &MF, MDNode *PCSections) {
@@ -512,7 +518,8 @@ void MachineInstr::setPCSections(MachineFunction &MF, MDNode *PCSections) {
return;
setExtraInfo(MF, memoperands(), getPreInstrSymbol(), getPostInstrSymbol(),
- getHeapAllocMarker(), PCSections, getCFIType());
+ getHeapAllocMarker(), PCSections, getCFIType(),
+ getMMRAMetadata());
}
void MachineInstr::setCFIType(MachineFunction &MF, uint32_t Type) {
@@ -521,7 +528,16 @@ void MachineInstr::setCFIType(MachineFunction &MF, uint32_t Type) {
return;
setExtraInfo(MF, memoperands(), getPreInstrSymbol(), getPostInstrSymbol(),
- getHeapAllocMarker(), getPCSections(), Type);
+ getHeapAllocMarker(), getPCSections(), Type, getMMRAMetadata());
+}
+
+void MachineInstr::setMMRAMetadata(MachineFunction &MF, MDNode *MMRAs) {
+ // Do nothing if old and new symbols are the same.
+ if (MMRAs == getMMRAMetadata())
+ return;
+
+ setExtraInfo(MF, memoperands(), getPreInstrSymbol(), getPostInstrSymbol(),
+ getHeapAllocMarker(), getPCSections(), getCFIType(), MMRAs);
}
void MachineInstr::cloneInstrSymbols(MachineFunction &MF,
@@ -537,6 +553,7 @@ void MachineInstr::cloneInstrSymbols(MachineFunction &MF,
setPostInstrSymbol(MF, MI.getPostInstrSymbol());
setHeapAllocMarker(MF, MI.getHeapAllocMarker());
setPCSections(MF, MI.getPCSections());
+ setMMRAMetadata(MF, MI.getMMRAMetadata());
}
uint32_t MachineInstr::mergeFlagsWith(const MachineInstr &Other) const {
@@ -1881,6 +1898,14 @@ void MachineInstr::print(raw_ostream &OS, ModuleSlotTracker &MST,
OS << " pcsections ";
PCSections->printAsOperand(OS, MST);
}
+ if (MDNode *MMRA = getMMRAMetadata()) {
+ if (!FirstOp) {
+ FirstOp = false;
+ OS << ',';
+ }
+ OS << " mmra ";
+ MMRA->printAsOperand(OS, MST);
+ }
if (uint32_t CFIType = getCFIType()) {
if (!FirstOp)
OS << ',';
diff --git a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
index c9e2745f00c958..6a683720e5a047 100644
--- a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
@@ -27,6 +27,7 @@
#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/Config/llvm-config.h"
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/MC/MCInstrItineraries.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
@@ -898,6 +899,14 @@ EmitSchedule(MachineBasicBlock::iterator &InsertPos) {
if (MDNode *MD = DAG->getPCSections(Node))
MI->setPCSections(MF, MD);
+ // Set MMRAs on _all_ added instructions.
+ if (MDNode *MMRA = DAG->getMMRAMetadata(Node)) {
+ for (MachineBasicBlock::iterator It = MI->getIterator(),
+ End = std::next(After);
+ It != End; ++It)
+ It->setMMRAMetadata(MF, MMRA);
+ }
+
return MI;
};
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index e8d1ac1d3a9167..7a161032867382 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12913,7 +12913,7 @@ void SelectionDAG::copyExtraInfo(SDNode *From, SDNode *To) {
// Use of operator[] on the DenseMap may cause an insertion, which invalidates
// the iterator, hence the need to make a copy to prevent a use-after-free.
NodeExtraInfo NEI = I->second;
- if (LLVM_LIKELY(!NEI.PCSections)) {
+ if (LLVM_LIKELY(!NEI.PCSections) && LLVM_LIKELY(!NEI.MMRA)) {
// No deep copy required for the types of extra info set.
//
// FIXME: Investigate if other types of extra info also need deep copy. This
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 618bdee7f40531..e80a8e5b162061 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -80,6 +80,7 @@
#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/IntrinsicsWebAssembly.h"
#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Operator.h"
@@ -1326,7 +1327,8 @@ void SelectionDAGBuilder::visit(const Instruction &I) {
bool NodeInserted = false;
std::unique_ptr<SelectionDAG::DAGNodeInsertedListener> InsertedListener;
MDNode *PCSectionsMD = I.getMetadata(LLVMContext::MD_pcsections);
- if (PCSectionsMD) {
+ MDNode *MMRA = I.getMetadata(LLVMContext::MD_mmra);
+ if (PCSectionsMD || MMRA) {
InsertedListener = std::make_unique<SelectionDAG::DAGNodeInsertedListener>(
DAG, [&](SDNode *) { NodeInserted = true; });
}
@@ -1338,14 +1340,17 @@ void SelectionDAGBuilder::visit(const Instruction &I) {
CopyToExportRegsIfNeeded(&I);
// Handle metadata.
- if (PCSectionsMD) {
+ if (PCSectionsMD || MMRA) {
auto It = NodeMap.find(&I);
if (It != NodeMap.end()) {
- DAG.addPCSections(It->second.getNode(), PCSectionsMD);
+ if (PCSectionsMD)
+ DAG.addPCSections(It->second.getNode(), PCSectionsMD);
+ if (MMRA)
+ DAG.addMMRAMetadata(It->second.getNode(), MMRA);
} else if (NodeInserted) {
// This should not happen; if it does, don't let it go unnoticed so we can
// fix it. Relevant visit*() function is probably missing a setValue().
- errs() << "warning: loosing !pcsections metadata ["
+ errs() << "warning: loosing !pcsections and/or !mmra metadata ["
<< I.getModule()->getName() << "]\n";
LLVM_DEBUG(I.dump());
assert(false);
@@ -5282,9 +5287,9 @@ void SelectionDAGBuilder::visitTargetIntrinsic(const CallInst &I,
Result =
DAG.getAssertAlign(getCurSDLoc(), Result, Alignment.valueOrOne());
}
-
- setValue(&I, Result);
}
+
+ setValue(&I, Result);
}
/// GetSignificand - Get the significand and build it into a floating-point
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index 20375a0f92b238..6de15d58daacf2 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -904,6 +904,13 @@ void SDNode::print_details(raw_ostream &OS, const SelectionDAG *G) const {
MD->printAsOperand(OS, G->getMachineFunction().getFunction().getParent());
OS << ']';
}
+
+ if (MDNode *MMRA = G ? G->getMMRAMetadata(this) : nullptr) {
+ OS << " [mmra ";
+ MMRA->printAsOperand(OS,
+ G->getMachineFunction().getFunction().getParent());
+ OS << ']';
+ }
}
}
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
index d629c36bc792e3..b5694c955b8c8f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
@@ -1059,6 +1059,8 @@ class ISelUpdater : public SelectionDAG::DAGUpdateListener {
SDNode *CurNode = &*ISelPosition;
if (MDNode *MD = DAG.getPCSections(CurNode))
DAG.addPCSections(N, MD);
+ if (MDNode *MMRA = DAG.getMMRAMetadata(CurNode))
+ DAG.addMMRAMetadata(N, MMRA);
}
};
diff --git a/llvm/lib/IR/CMakeLists.txt b/llvm/lib/IR/CMakeLists.txt
index f1668ee3be63b5..b5fb7409d8e88e 100644
--- a/llvm/lib/IR/CMakeLists.txt
+++ b/llvm/lib/IR/CMakeLists.txt
@@ -41,6 +41,7 @@ add_llvm_component_library(LLVMCore
LLVMRemarkStreamer.cpp
LegacyPassManager.cpp
MDBuilder.cpp
+ MemoryModelRelaxationAnnotations.cpp
Mangler.cpp
Metadata.cpp
Module.cpp
diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index d6746d1d438242..6f51c048560837 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -23,6 +23,7 @@
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/IR/NoFolder.h"
#include "llvm/IR/Operator.h"
#include "llvm/IR/Statepoint.h"
@@ -75,6 +76,14 @@ void IRBuilderBase::SetInstDebugLocation(Instruction *I) const {
}
}
+void IRBuilderBase::AddMetadataToInst(Instruction *I) const {
+ for (const auto &KV : MetadataToCopy) {
+ if (KV.first == LLVMContext::MD_mmra && !canInstructionHaveMMRAs(*I))
+ continue;
+ I->setMetadata(KV.first, KV.second);
+ }
+}
+
CallInst *
IRBuilderBase::createCallHelper(Function *Callee, ArrayRef<Value *> Ops,
const Twine &Name, Instruction *FMFSource,
diff --git a/llvm/lib/IR/Instruction.cpp b/llvm/lib/IR/Instruction.cpp
index 0602a55b9fe7f3..1460eb5cb5cff8 100644
--- a/llvm/lib/IR/Instruction.cpp
+++ b/llvm/lib/IR/Instruction.cpp
@@ -17,6 +17,7 @@
#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/IR/Operator.h"
#include "llvm/IR/ProfDataUtils.h"
#include "llvm/IR/Type.h"
@@ -496,7 +497,8 @@ void Instruction::dropUBImplyingAttrsAndMetadata() {
// !noundef and various AA metadata must be dropped, as it generally produces
// immediate undefined behavior.
unsigned KnownIDs[] = {LLVMContext::MD_annotation, LLVMContext::MD_range,
- LLVMContext::MD_nonnull, LLVMContext::MD_align};
+ LLVMContext::MD_nonnull, LLVMContext::MD_align,
+ LLVMContext::MD_mmra};
dropUBImplyingAttrsAndUnknownMetadata(KnownIDs);
}
diff --git a/llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp b/llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp
new file mode 100644
index 00000000000000..4d9b24c005f455
--- /dev/null
+++ b/llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp
@@ -0,0 +1,186 @@
+//===- MemoryModelRelaxationAnnotations.cpp ---------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
+#include "llvm/ADT/StringSet.h"
+#include "llvm/IR/Metadata.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/raw_ostream.h"
+
+// FIXME: Only needed for canInstructionHaveMMRAs, should it move to another
+// file?
+#include "llvm/IR/Instructions.h"
+
+using namespace llvm;
+
+static bool isTagMD(const MDNode *MD) {
+ return isa<MDTuple>(MD) && MD->getNumOperands() == 2 &&
+ isa<MDString>(MD->getOperand(0)) && isa<MDString>(MD->getOperand(1));
+}
+
+MMRAMetadata::MMRAMetadata(const Instruction &I)
+ : MMRAMetadata(I.getMetadata(LLVMContext::MD_mmra)) {}
+
+MMRAMetadata::MMRAMetadata(MDNode *MD) {
+ if (!MD)
+ return;
+
+ // TODO: Split this into a "tryParse" function that can return an err.
+ // CTor can use the tryParse & just fatal on err.
+
+ MDTuple *Tuple = dyn_cast<MDTuple>(MD);
+ assert(Tuple && "Invalid MMRA structure");
+
+ const auto HandleTagMD = [this](MDNode *TagMD) {
+ addTag(cast<MDString>(TagMD->getOperand(0))->getString(),
+ cast<MDString>(TagMD->getOperand(1))->getString());
+ };
+
+ if (isTagMD(Tuple)) {
+ HandleTagMD(Tuple);
+ return;
+ }
+
+ for (const MDOperand &Op : Tuple->operands()) {
+ MDNode *MDOp = cast<MDNode>(Op.get());
+ assert(isTagMD(MDOp));
+ HandleTagMD(MDOp);
+ }
+}
+
+bool MMRAMetadata::isTagMD(const Metadata *MD) {
+ if (auto *Tuple = dyn_cast<MDTuple>(MD)) {
+ return Tuple->getNumOperands() == 2 &&
+ isa<MDString>(Tuple->getOperand(0)) &&
+ isa<MDString>(Tuple->getOperand(1));
+ }
+ return false;
+}
+
+bool MMRAMetadata::isCompatibleWith(const MMRAMetadata &Other) const {
+ // Two sets of tags are compatible iff, for every unique tag prefix P
+ // present in at least one set:
+ // - the other set contains no tag with prefix P, or
+ // - at least one tag with prefix P is common to both sets.
+
+ StringMap<bool> PrefixStatuses;
+ for (const auto &[P, S] : Tags)
+ PrefixStatuses[P] |= (Other.hasTag(P, S) || !Other.hasTagWithPrefix(P));
+ for (const auto &[P, S] : Other)
+ PrefixStatuses[P] |= (hasTag(P, S) || !hasTagWithPrefix(P));
+
+ for (auto &[Prefix, Status] : PrefixStatuses) {
+ if (!Status)
+ return false;
+ }
+
+ return true;
+}
+
+MMRAMetadata MMRAMetadata::combine(const MMRAMetadata &Other) const {
+ // Let A and B be two tags set, and U be the prefix-wise union of A and B.
+ // For every unique tag prefix P present in A or B:
+ // * If either A or B has no tags with prefix P, no tags with prefix
+ // P are added to U.
+ // * If both A and B have at least one tag with prefix P, all tags with prefix
+ // P from both sets are added to U.
+
+ MMRAMetadata U;
+ for (const auto &[P, S] : Tags) {
+ if (Other.hasTagWithPrefix(P))
+ U.addTag(P, S);
+ }
+ for (const auto &[P, S] : Other.Tags) {
+ if (hasTagWithPrefix(P))
+ U.addTag(P, S);
+ }
+
+ return U;
+}
+
+MMRAMetadata &MMRAMetadata::addTag(StringRef Prefix, StringRef Suffix) {
+ Tags.insert(std::make_pair(Prefix.str(), Suffix.str()));
+ return *this;
+}
+
+bool MMRAMetadata::hasTag(StringRef Prefix, StringRef Suffix) const {
+ return Tags.count(std::make_pair(Prefix.str(), Suffix.str()));
+}
+
+std::vector<MMRAMetadata::TagT>
+MMRAMetadata::getAllTagsWithPrefix(StringRef Prefix) const {
+ std::vector<TagT> Result;
+ for (const auto &T : Tags) {
+ if (T.first == Prefix)
+ Result.push_back(T);
+ }
+ return Result;
+}
+
+bool MMRAMetadata::hasTagWithPrefix(StringRef Prefix) const {
+ for (const auto &[P, S] : Tags)
+ if (P == Prefix)
+ return true;
+ return false;
+}
+
+MMRAMetadata::const_iterator MMRAMetadata::begin() const {
+ return Tags.begin();
+}
+
+MMRAMetadata::const_iterator MMRAMetadata::end() const { return Tags.end(); }
+
+bool MMRAMetadata::empty() const { return Tags.empty(); }
+
+unsigned MMRAMetadata::size() const { return Tags.size(); }
+
+static MDTuple *getMDForPair(LLVMContext &Ctx, StringRef P, StringRef S) {
+ return MDTuple::get(Ctx, {MDString::get(Ctx, P), MDString::get(Ctx, S)});
+}
+
+MDTuple *MMRAMetadata::getAsMD(LLVMContext &Ctx) const {
+ if (empty())
+ return MDTuple::get(Ctx, {});
+
+ std::vector<Metadata *> TagMDs;
+ TagMDs.reserve(Tags.size());
+
+ for (const auto &[P, S] : Tags)
+ TagMDs.push_back(getMDForPair(Ctx, P, S));
+
+ if (TagMDs.size() == 1)
+ return cast<MDTuple>(TagMDs.front());
+ return MDTuple::get(Ctx, TagMDs);
+}
+
+void MMRAMetadata::print(raw_ostream &OS) const {
+ bool IsFirst = true;
+ // TODO: use map_iter + join
+ for (const auto &[P, S] : Tags) {
+ if (IsFirst)
+ IsFirst = false;
+ else
+ OS << ", ";
+ OS << P << ":" << S;
+ }
+}
+
+LLVM_DUMP_METHOD
+void MMRAMetadata::dump() const { print(dbgs()); }
+
+static bool isReadWriteMemCall(const Instruction &I) {
+ if (const auto *C = dyn_cast<CallBase>(&I))
+ return C->mayReadOrWriteMemory() ||
+ !C->getMemoryEffects().doesNotAccessMemory();
+ return false;
+}
+
+bool llvm::canInstructionHaveMMRAs(const Instruction &I) {
+ return isa<LoadInst>(I) || isa<StoreInst>(I) || isa<AtomicCmpXchgInst>(I) ||
+ isa<AtomicRMWInst>(I) || isa<FenceInst>(I) || isReadWriteMemCall(I);
+}
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 33f358440a312d..748e8bc009bf78 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -99,6 +99,7 @@
#include "llvm/IR/IntrinsicsNVPTX.h"
#include "llvm/IR/IntrinsicsWebAssembly.h"
#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/ModuleSlotTracker.h"
@@ -116,6 +117,7 @@
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"
+#include "llvm/Support/ModRef.h"
#include "llvm/Support/raw_ostream.h"
#include <algorithm>
#include <cassert>
@@ -529,6 +531,7 @@ class Verifier : public InstVisitor<Verifier>, VerifierSupport {
void visitMemProfMetadata(Instruction &I, MDNode *MD);
void visitCallsiteMetadata(Instruction &I, MDNode *MD);
void visitDIAssignIDMetadata(Instruction &I, MDNode *MD);
+ void visitMMRAMetadata(Instruction &I, MDNode *MD);
void visitAnnotationMetadata(MDNode *Annotation);
void visitAliasScopeMetadata(const MDNode *MD);
void visitAliasScopeListMetadata(const MDNode *MD);
@@ -4817,6 +4820,31 @@ void Verifier::visitDIAssignIDMetadata(Instruction &I, MDNode *MD) {
}
}
+void Verifier::visitMMRAMetadata(Instruction &I, MDNode *MD) {
+ Check(canInstructionHaveMMRAs(I),
+ "!mmra metadata attached to unexpected instruction kind", I, MD);
+
+ const auto IsLeaf = [](const Metadata *CurMD) {
+ const MDNode *Tuple = dyn_cast<MDTuple>(CurMD);
+ return Tuple && Tuple->getNumOperands() == 2 &&
+ isa<MDString>(Tuple->getOperand(0)) &&
+ isa<MDString>(Tuple->getOperand(1));
+ };
+
+ // MMRA Metadata should either be a tag, e.g. !{!"foo", !"bar"}, or a
+ // list of tags such as !2 in the following example:
+ // !0 = !{!"a", !"b"}
+ // !1 = !{!"c", !"d"}
+ // !2 = !{!0, !1}
+ if (MMRAMetadata::isTagMD(MD))
+ return;
+
+ Check(isa<MDTuple>(MD), "!mmra expected to be a metadata tuple", I, MD);
+ for (const MDOperand &MDOp : MD->operands())
+ Check(MMRAMetadata::isTagMD(MDOp.get()),
+ "!mmra metadata tuple operand is not an MMRA tag", I, MDOp.get());
+}
+
void Verifier::visitCallStackMetadata(MDNode *MD) {
// Call stack metadata should consist of a list of at least 1 constant int
// (representing a hash of the location).
@@ -5134,6 +5162,9 @@ void Verifier::visitInstruction(Instruction &I) {
if (MDNode *MD = I.getMetadata(LLVMContext::MD_DIAssignID))
visitDIAssignIDMetadata(I, MD);
+ if (MDNode *MMRA = I.getMetadata(LLVMContext::MD_mmra))
+ visitMMRAMetadata(I, MMRA);
+
if (MDNode *Annotation = I.getMetadata(LLVMContext::MD_annotation))
visitAnnotationMetadata(Annotation);
diff --git a/llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp b/llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
index d65054a6ff9d5f..76b4640340fcab 100644
--- a/llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
+++ b/llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
@@ -367,7 +367,7 @@ bool MergedLoadStoreMotion::run(Function &F, AliasAnalysis &AA) {
// Merge unconditional branches, allowing PRE to catch more
// optimization opportunities.
- // This loop doesn't care about newly inserted/split blocks
+ // This loop doesn't care about newly inserted/split blocks
// since they never will be diamond heads.
for (BasicBlock &BB : make_early_inc_range(F))
// Hoist equivalent loads and sink stores
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index 380bac9c618077..85759e15b4ccde 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -59,6 +59,7 @@
#include "llvm/IR/IntrinsicsWebAssembly.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/PatternMatch.h"
@@ -3284,6 +3285,9 @@ void llvm::combineMetadata(Instruction *K, const Instruction *J,
case LLVMContext::MD_invariant_group:
// Preserve !invariant.group in K.
break;
+ case LLVMContext::MD_mmra:
+ // Combine MMRAs
+ break;
case LLVMContext::MD_align:
if (DoesKMove || !K->hasMetadata(LLVMContext::MD_noundef))
K->setMetadata(
@@ -3322,6 +3326,16 @@ void llvm::combineMetadata(Instruction *K, const Instruction *J,
if (auto *JMD = J->getMetadata(LLVMContext::MD_invariant_group))
if (isa<LoadInst>(K) || isa<StoreInst>(K))
K->setMetadata(LLVMContext::MD_invariant_group, JMD);
+
+ // Merge MMRAs.
+ // This is handled separately because we also want to handle cases where K
+ // doesn't have tags but J does.
+ auto JTags = MMRAMetadata(J->getMetadata(LLVMContext::MD_mmra));
+ auto KTags = MMRAMetadata(K->getMetadata(LLVMContext::MD_mmra));
+ if (JTags || KTags) {
+ K->setMetadata(LLVMContext::MD_mmra,
+ JTags.combine(KTags).getAsMD(K->getContext()));
+ }
}
void llvm::combineMetadataForCSE(Instruction *K, const Instruction *J,
@@ -3341,7 +3355,8 @@ void llvm::combineMetadataForCSE(Instruction *K, const Instruction *J,
LLVMContext::MD_preserve_access_index,
LLVMContext::MD_prof,
LLVMContext::MD_nontemporal,
- LLVMContext::MD_noundef};
+ LLVMContext::MD_noundef,
+ LLVMContext::MD_mmra};
combineMetadata(K, J, KnownIDs, KDominatesJ);
}
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 55bbffb18879fb..cfdff6a082d03d 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -51,6 +51,7 @@
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/NoFolder.h"
@@ -1677,7 +1678,8 @@ bool SimplifyCFGOpt::hoistCommonCodeFromSuccessors(BasicBlock *BB,
for (auto &SuccIter : OtherSuccIterRange) {
Instruction *I2 = &*SuccIter;
HasTerminator |= I2->isTerminator();
- if (AllInstsAreIdentical && !I1->isIdenticalToWhenDefined(I2))
+ if (AllInstsAreIdentical && (!I1->isIdenticalToWhenDefined(I2) ||
+ MMRAMetadata(*I1) != MMRAMetadata(*I2)))
AllInstsAreIdentical = false;
}
@@ -1964,6 +1966,7 @@ static bool canSinkInstructions(
}
const Instruction *I0 = Insts.front();
+ const auto I0MMRA = MMRAMetadata(*I0);
for (auto *I : Insts) {
if (!I->isSameOperationAs(I0))
return false;
@@ -1975,6 +1978,11 @@ static bool canSinkInstructions(
return false;
if (isa<LoadInst>(I) && I->getOperand(0)->isSwiftError())
return false;
+
+ // Treat MMRAs conservatively. This pass can be quite aggressive and
+ // could drop a lot of MMRAs otherwise.
+ if (MMRAMetadata(*I) != I0MMRA)
+ return false;
}
// All instructions in Insts are known to be the same opcode. If they have a
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/mmra.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/mmra.ll
new file mode 100644
index 00000000000000..71a2d3e8a5304f
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/mmra.ll
@@ -0,0 +1,34 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx900 -stop-after=finalize-isel < %s | FileCheck %s
+
+declare void @readsMem(ptr) #0
+declare void @writesMem(ptr) #1
+
+define void @fence_loads(ptr %ptr) {
+ ; CHECK-LABEL: name: fence_loads
+ ; CHECK: bb.1 (%ir-block.0):
+ ; CHECK-NEXT: liveins: $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+ ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1
+ ; CHECK-NEXT: ATOMIC_FENCE 5, 1, mmra !0
+ ; CHECK-NEXT: [[FLAT_LOAD_UBYTE:%[0-9]+]]:vgpr_32 = FLAT_LOAD_UBYTE [[REG_SEQUENCE]], 0, 0, implicit $exec, implicit $flat_scr, mmra !1 :: (load acquire (s8) from %ir.ptr, align 4)
+ ; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
+ ; CHECK-NEXT: FLAT_STORE_BYTE [[REG_SEQUENCE]], [[COPY2]], 0, 0, implicit $exec, implicit $flat_scr, mmra !2 :: (store release (s8) into %ir.ptr, align 4)
+ ; CHECK-NEXT: SI_RETURN
+ fence release, !mmra !0
+ %ld = load atomic i8, ptr %ptr acquire, align 4, !mmra !2
+ store atomic i8 1, ptr %ptr release, align 4, !mmra !1
+ ret void
+}
+
+; TODO: test atomicrmw, cmpxchg - current lowering doesn't work and blows up on i1 PHIs.
+
+attributes #0 = { memory(read) }
+attributes #1 = { memory(write) }
+
+!0 = !{!"foo", !"bar"}
+!1 = !{!"bux", !"baz"}
+!2 = !{!0, !1}
diff --git a/llvm/test/CodeGen/AMDGPU/mmra.ll b/llvm/test/CodeGen/AMDGPU/mmra.ll
new file mode 100644
index 00000000000000..d9b48f79739b67
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/mmra.ll
@@ -0,0 +1,189 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
+; RUN: llc -march=amdgcn -mcpu=gfx900 -stop-after=finalize-isel < %s | FileCheck %s
+
+declare void @readsMem(ptr) #0
+declare void @writesMem(ptr) #1
+
+define void @fence_loads(ptr %ptr) {
+ ; CHECK-LABEL: name: fence_loads
+ ; CHECK: bb.0 (%ir-block.0):
+ ; CHECK-NEXT: liveins: $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF1:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY]], %subreg.sub1
+ ; CHECK-NEXT: ATOMIC_FENCE 5, 1, mmra !0
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]], mmra !1
+ ; CHECK-NEXT: [[FLAT_LOAD_UBYTE:%[0-9]+]]:vgpr_32 = FLAT_LOAD_UBYTE [[COPY2]], 0, 0, implicit $exec, implicit $flat_scr, mmra !1 :: (load acquire (s8) from %ir.ptr, align 4)
+ ; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 1, mmra !2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]], mmra !2
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]], mmra !2
+ ; CHECK-NEXT: FLAT_STORE_BYTE [[COPY3]], killed [[COPY4]], 0, 0, implicit $exec, implicit $flat_scr, mmra !2 :: (store release (s8) into %ir.ptr, align 4)
+ ; CHECK-NEXT: SI_RETURN
+ fence release, !mmra !0
+ %ld = load atomic i8, ptr %ptr acquire, align 4, !mmra !2
+ store atomic i8 1, ptr %ptr release, align 4, !mmra !1
+ ret void
+}
+
+define void @atomicrmw_acq(ptr %ptr) {
+ ; CHECK-LABEL: name: atomicrmw_acq
+ ; CHECK: bb.0 (%ir-block.0):
+ ; CHECK-NEXT: liveins: $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF1:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY]], %subreg.sub1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]], mmra !1
+ ; CHECK-NEXT: [[FLAT_LOAD_UBYTE:%[0-9]+]]:vgpr_32 = FLAT_LOAD_UBYTE killed [[COPY2]], 0, 0, implicit $exec, implicit $flat_scr, mmra !1 :: (load acquire (s8) from %ir.ptr)
+ ; CHECK-NEXT: SI_RETURN
+ %old.2 = atomicrmw add ptr %ptr, i8 0 acquire, !mmra !2
+ ret void
+}
+
+define void @atomicrmw_rel(ptr %ptr) {
+ ; CHECK-LABEL: name: atomicrmw_rel
+ ; CHECK: bb.0 (%ir-block.0):
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: liveins: $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF1:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY]], %subreg.sub1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[REG_SEQUENCE]].sub1
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[REG_SEQUENCE]].sub0
+ ; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 -4
+ ; CHECK-NEXT: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY3]], killed [[S_MOV_B32_]], implicit $exec
+ ; CHECK-NEXT: [[DEF2:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF3:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[V_AND_B32_e64_]], %subreg.sub0, [[COPY2]], %subreg.sub1
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE1]]
+ ; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 3
+ ; CHECK-NEXT: [[V_AND_B32_e64_1:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY3]], [[S_MOV_B32_1]], implicit $exec
+ ; CHECK-NEXT: [[V_LSHLREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e64 [[S_MOV_B32_1]], killed [[V_AND_B32_e64_1]], implicit $exec
+ ; CHECK-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 255
+ ; CHECK-NEXT: [[V_LSHLREV_B32_e64_1:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e64 killed [[V_LSHLREV_B32_e64_]], killed [[S_MOV_B32_2]], implicit $exec
+ ; CHECK-NEXT: [[V_NOT_B32_e32_:%[0-9]+]]:vgpr_32 = V_NOT_B32_e32 [[V_LSHLREV_B32_e64_1]], implicit $exec
+ ; CHECK-NEXT: [[COPY5:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE1]], mmra !2
+ ; CHECK-NEXT: [[FLAT_LOAD_DWORD:%[0-9]+]]:vgpr_32 = FLAT_LOAD_DWORD [[COPY5]], 0, 0, implicit $exec, implicit $flat_scr, mmra !2 :: (load (s32) from %ir.AlignedAddr)
+ ; CHECK-NEXT: [[S_MOV_B64_:%[0-9]+]]:sreg_64 = S_MOV_B64 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1.atomicrmw.start:
+ ; CHECK-NEXT: successors: %bb.2(0x04000000), %bb.1(0x7c000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI:%[0-9]+]]:sreg_64 = PHI [[S_MOV_B64_]], %bb.0, %7, %bb.1
+ ; CHECK-NEXT: [[PHI1:%[0-9]+]]:vgpr_32 = PHI [[FLAT_LOAD_DWORD]], %bb.0, %6, %bb.1
+ ; CHECK-NEXT: [[V_OR_B32_e64_:%[0-9]+]]:vgpr_32 = V_OR_B32_e64 [[V_NOT_B32_e32_]], [[V_LSHLREV_B32_e64_1]], implicit $exec
+ ; CHECK-NEXT: [[V_AND_B32_e64_2:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[PHI1]], killed [[V_OR_B32_e64_]], implicit $exec
+ ; CHECK-NEXT: [[DEF4:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF5:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[V_AND_B32_e64_2]], %subreg.sub0, [[PHI1]], %subreg.sub1, mmra !2
+ ; CHECK-NEXT: [[COPY6:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE2]], mmra !2
+ ; CHECK-NEXT: [[FLAT_ATOMIC_CMPSWAP_RTN:%[0-9]+]]:vgpr_32 = FLAT_ATOMIC_CMPSWAP_RTN [[COPY4]], killed [[COPY6]], 0, 1, implicit $exec, implicit $flat_scr, mmra !2 :: (load store release monotonic (s32) on %ir.AlignedAddr)
+ ; CHECK-NEXT: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_64 = V_CMP_EQ_U32_e64 [[FLAT_ATOMIC_CMPSWAP_RTN]], [[PHI1]], implicit $exec, mmra !2
+ ; CHECK-NEXT: [[SI_IF_BREAK:%[0-9]+]]:sreg_64 = SI_IF_BREAK killed [[V_CMP_EQ_U32_e64_]], [[PHI]], implicit-def dead $scc
+ ; CHECK-NEXT: SI_LOOP [[SI_IF_BREAK]], %bb.1, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ ; CHECK-NEXT: S_BRANCH %bb.2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2.atomicrmw.end:
+ ; CHECK-NEXT: [[PHI2:%[0-9]+]]:sreg_64 = PHI [[SI_IF_BREAK]], %bb.1
+ ; CHECK-NEXT: SI_END_CF [[PHI2]], implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ ; CHECK-NEXT: SI_RETURN
+ %old.2 = atomicrmw add ptr %ptr, i8 0 release, !mmra !1
+ ret void
+}
+
+define void @cmpxchg(ptr %ptr) {
+ ; CHECK-LABEL: name: cmpxchg
+ ; CHECK: bb.0 (%ir-block.0):
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: liveins: $vgpr0, $vgpr1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF1:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY]], %subreg.sub1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[REG_SEQUENCE]].sub1
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[REG_SEQUENCE]].sub0
+ ; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 -4
+ ; CHECK-NEXT: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY3]], killed [[S_MOV_B32_]], implicit $exec
+ ; CHECK-NEXT: [[DEF2:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF3:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[V_AND_B32_e64_]], %subreg.sub0, [[COPY2]], %subreg.sub1
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE1]]
+ ; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 3
+ ; CHECK-NEXT: [[V_AND_B32_e64_1:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY3]], [[S_MOV_B32_1]], implicit $exec
+ ; CHECK-NEXT: [[V_LSHLREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e64 [[S_MOV_B32_1]], killed [[V_AND_B32_e64_1]], implicit $exec
+ ; CHECK-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 255
+ ; CHECK-NEXT: [[V_LSHLREV_B32_e64_1:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e64 [[V_LSHLREV_B32_e64_]], killed [[S_MOV_B32_2]], implicit $exec
+ ; CHECK-NEXT: [[V_NOT_B32_e32_:%[0-9]+]]:vgpr_32 = V_NOT_B32_e32 killed [[V_LSHLREV_B32_e64_1]], implicit $exec
+ ; CHECK-NEXT: [[S_MOV_B32_3:%[0-9]+]]:sreg_32 = S_MOV_B32 1
+ ; CHECK-NEXT: [[V_LSHLREV_B32_e64_2:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e64 [[V_LSHLREV_B32_e64_]], killed [[S_MOV_B32_3]], implicit $exec
+ ; CHECK-NEXT: [[COPY5:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE1]], mmra !1
+ ; CHECK-NEXT: [[FLAT_LOAD_DWORD:%[0-9]+]]:vgpr_32 = FLAT_LOAD_DWORD [[COPY5]], 0, 0, implicit $exec, implicit $flat_scr, mmra !1 :: (load (s32) from %ir.AlignedAddr)
+ ; CHECK-NEXT: [[V_AND_B32_e64_2:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 killed [[FLAT_LOAD_DWORD]], [[V_NOT_B32_e32_]], implicit $exec
+ ; CHECK-NEXT: [[S_MOV_B64_:%[0-9]+]]:sreg_64 = S_MOV_B64 0
+ ; CHECK-NEXT: [[DEF4:%[0-9]+]]:sreg_64 = IMPLICIT_DEF
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1.partword.cmpxchg.loop:
+ ; CHECK-NEXT: successors: %bb.2(0x40000000), %bb.3(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI:%[0-9]+]]:sreg_64 = PHI [[DEF4]], %bb.0, %12, %bb.3
+ ; CHECK-NEXT: [[PHI1:%[0-9]+]]:sreg_64 = PHI [[S_MOV_B64_]], %bb.0, %13, %bb.3
+ ; CHECK-NEXT: [[PHI2:%[0-9]+]]:vgpr_32 = PHI [[V_AND_B32_e64_2]], %bb.0, %11, %bb.3
+ ; CHECK-NEXT: [[V_OR_B32_e64_:%[0-9]+]]:vgpr_32 = V_OR_B32_e64 [[PHI2]], [[V_LSHLREV_B32_e64_2]], implicit $exec
+ ; CHECK-NEXT: [[DEF5:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF6:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[V_OR_B32_e64_]], %subreg.sub0, [[PHI2]], %subreg.sub1, mmra !1
+ ; CHECK-NEXT: [[COPY6:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE2]], mmra !1
+ ; CHECK-NEXT: [[FLAT_ATOMIC_CMPSWAP_RTN:%[0-9]+]]:vgpr_32 = FLAT_ATOMIC_CMPSWAP_RTN [[COPY4]], killed [[COPY6]], 0, 1, implicit $exec, implicit $flat_scr, mmra !1 :: (load store acquire acquire (s32) on %ir.AlignedAddr)
+ ; CHECK-NEXT: [[V_CMP_NE_U32_e64_:%[0-9]+]]:sreg_64 = V_CMP_NE_U32_e64 [[FLAT_ATOMIC_CMPSWAP_RTN]], [[PHI2]], implicit $exec
+ ; CHECK-NEXT: [[S_MOV_B64_1:%[0-9]+]]:sreg_64 = S_MOV_B64 -1
+ ; CHECK-NEXT: [[DEF7:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[DEF7]]
+ ; CHECK-NEXT: [[S_OR_B64_:%[0-9]+]]:sreg_64 = S_OR_B64 [[PHI]], $exec, implicit-def $scc
+ ; CHECK-NEXT: [[SI_IF:%[0-9]+]]:sreg_64 = SI_IF killed [[V_CMP_NE_U32_e64_]], %bb.3, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ ; CHECK-NEXT: S_BRANCH %bb.2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2.partword.cmpxchg.failure:
+ ; CHECK-NEXT: successors: %bb.3(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[V_AND_B32_e64_3:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[FLAT_ATOMIC_CMPSWAP_RTN]], [[V_NOT_B32_e32_]], implicit $exec
+ ; CHECK-NEXT: [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_64 = V_CMP_EQ_U32_e64 [[PHI2]], [[V_AND_B32_e64_3]], implicit $exec
+ ; CHECK-NEXT: [[S_ANDN2_B64_:%[0-9]+]]:sreg_64 = S_ANDN2_B64 [[S_OR_B64_]], $exec, implicit-def $scc
+ ; CHECK-NEXT: [[S_AND_B64_:%[0-9]+]]:sreg_64 = S_AND_B64 [[V_CMP_EQ_U32_e64_]], $exec, implicit-def $scc
+ ; CHECK-NEXT: [[S_OR_B64_1:%[0-9]+]]:sreg_64 = S_OR_B64 [[S_ANDN2_B64_]], [[S_AND_B64_]], implicit-def $scc
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.3.Flow:
+ ; CHECK-NEXT: successors: %bb.4(0x04000000), %bb.1(0x7c000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI3:%[0-9]+]]:sreg_64 = PHI [[S_OR_B64_]], %bb.1, [[S_OR_B64_1]], %bb.2
+ ; CHECK-NEXT: [[PHI4:%[0-9]+]]:vgpr_32 = PHI [[COPY7]], %bb.1, [[V_AND_B32_e64_3]], %bb.2
+ ; CHECK-NEXT: SI_END_CF [[SI_IF]], implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ ; CHECK-NEXT: [[COPY8:%[0-9]+]]:sreg_64 = COPY [[PHI3]]
+ ; CHECK-NEXT: [[SI_IF_BREAK:%[0-9]+]]:sreg_64 = SI_IF_BREAK [[COPY8]], [[PHI1]], implicit-def dead $scc
+ ; CHECK-NEXT: SI_LOOP [[SI_IF_BREAK]], %bb.1, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ ; CHECK-NEXT: S_BRANCH %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.4.partword.cmpxchg.end:
+ ; CHECK-NEXT: [[PHI5:%[0-9]+]]:sreg_64 = PHI [[SI_IF_BREAK]], %bb.3
+ ; CHECK-NEXT: [[PHI6:%[0-9]+]]:vgpr_32 = PHI [[FLAT_ATOMIC_CMPSWAP_RTN]], %bb.3
+ ; CHECK-NEXT: SI_END_CF [[PHI5]], implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ ; CHECK-NEXT: SI_RETURN
+ %pair = cmpxchg ptr %ptr, i8 0, i8 1 acquire acquire, !mmra !2
+ ret void
+}
+
+attributes #0 = { memory(read) }
+attributes #1 = { memory(write) }
+
+!0 = !{!"foo", !"bar"}
+!1 = !{!"bux", !"baz"}
+!2 = !{!0, !1}
diff --git a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-mmra.ll b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-mmra.ll
new file mode 100644
index 00000000000000..d51e9291a6119c
--- /dev/null
+++ b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-mmra.ll
@@ -0,0 +1,204 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-each -atomic-expand %s | FileCheck -check-prefix=GFX90A %s
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-each -atomic-expand %s | FileCheck -check-prefix=GFX1100 %s
+
+; Contains a variety of tests with different types of atomic expansions to check that MMRAs are
+; preserved.
+
+define i16 @test_atomicrmw_xchg_i16_global_agent(ptr addrspace(1) %ptr, i16 %value) {
+; GFX90A-LABEL: define i16 @test_atomicrmw_xchg_i16_global_agent(
+; GFX90A-SAME: ptr addrspace(1) [[PTR:%.*]], i16 [[VALUE:%.*]]) #[[ATTR0:[0-9]+]] {
+; GFX90A-NEXT: [[ALIGNEDADDR:%.*]] = call ptr addrspace(1) @llvm.ptrmask.p1.i64(ptr addrspace(1) [[PTR]], i64 -4)
+; GFX90A-NEXT: [[TMP1:%.*]] = ptrtoint ptr addrspace(1) [[PTR]] to i64
+; GFX90A-NEXT: [[PTRLSB:%.*]] = and i64 [[TMP1]], 3
+; GFX90A-NEXT: [[TMP2:%.*]] = shl i64 [[PTRLSB]], 3
+; GFX90A-NEXT: [[SHIFTAMT:%.*]] = trunc i64 [[TMP2]] to i32
+; GFX90A-NEXT: [[MASK:%.*]] = shl i32 65535, [[SHIFTAMT]]
+; GFX90A-NEXT: [[INV_MASK:%.*]] = xor i32 [[MASK]], -1
+; GFX90A-NEXT: [[TMP3:%.*]] = zext i16 [[VALUE]] to i32
+; GFX90A-NEXT: [[VALOPERAND_SHIFTED:%.*]] = shl i32 [[TMP3]], [[SHIFTAMT]]
+; GFX90A-NEXT: [[TMP4:%.*]] = load i32, ptr addrspace(1) [[ALIGNEDADDR]], align 4, !mmra [[META0:![0-9]+]]
+; GFX90A-NEXT: br label [[ATOMICRMW_START:%.*]]
+; GFX90A: atomicrmw.start:
+; GFX90A-NEXT: [[LOADED:%.*]] = phi i32 [ [[TMP4]], [[TMP0:%.*]] ], [ [[NEWLOADED:%.*]], [[ATOMICRMW_START]] ]
+; GFX90A-NEXT: [[TMP5:%.*]] = and i32 [[LOADED]], [[INV_MASK]]
+; GFX90A-NEXT: [[TMP6:%.*]] = or i32 [[TMP5]], [[VALOPERAND_SHIFTED]]
+; GFX90A-NEXT: [[TMP7:%.*]] = cmpxchg ptr addrspace(1) [[ALIGNEDADDR]], i32 [[LOADED]], i32 [[TMP6]] syncscope("agent") seq_cst seq_cst, align 4, !mmra [[META0]]
+; GFX90A-NEXT: [[SUCCESS:%.*]] = extractvalue { i32, i1 } [[TMP7]], 1
+; GFX90A-NEXT: [[NEWLOADED]] = extractvalue { i32, i1 } [[TMP7]], 0
+; GFX90A-NEXT: br i1 [[SUCCESS]], label [[ATOMICRMW_END:%.*]], label [[ATOMICRMW_START]]
+; GFX90A: atomicrmw.end:
+; GFX90A-NEXT: [[SHIFTED:%.*]] = lshr i32 [[NEWLOADED]], [[SHIFTAMT]]
+; GFX90A-NEXT: [[EXTRACTED:%.*]] = trunc i32 [[SHIFTED]] to i16
+; GFX90A-NEXT: ret i16 [[EXTRACTED]]
+;
+; GFX1100-LABEL: define i16 @test_atomicrmw_xchg_i16_global_agent(
+; GFX1100-SAME: ptr addrspace(1) [[PTR:%.*]], i16 [[VALUE:%.*]]) #[[ATTR0:[0-9]+]] {
+; GFX1100-NEXT: [[ALIGNEDADDR:%.*]] = call ptr addrspace(1) @llvm.ptrmask.p1.i64(ptr addrspace(1) [[PTR]], i64 -4)
+; GFX1100-NEXT: [[TMP1:%.*]] = ptrtoint ptr addrspace(1) [[PTR]] to i64
+; GFX1100-NEXT: [[PTRLSB:%.*]] = and i64 [[TMP1]], 3
+; GFX1100-NEXT: [[TMP2:%.*]] = shl i64 [[PTRLSB]], 3
+; GFX1100-NEXT: [[SHIFTAMT:%.*]] = trunc i64 [[TMP2]] to i32
+; GFX1100-NEXT: [[MASK:%.*]] = shl i32 65535, [[SHIFTAMT]]
+; GFX1100-NEXT: [[INV_MASK:%.*]] = xor i32 [[MASK]], -1
+; GFX1100-NEXT: [[TMP3:%.*]] = zext i16 [[VALUE]] to i32
+; GFX1100-NEXT: [[VALOPERAND_SHIFTED:%.*]] = shl i32 [[TMP3]], [[SHIFTAMT]]
+; GFX1100-NEXT: [[TMP4:%.*]] = load i32, ptr addrspace(1) [[ALIGNEDADDR]], align 4, !mmra [[META0:![0-9]+]]
+; GFX1100-NEXT: br label [[ATOMICRMW_START:%.*]]
+; GFX1100: atomicrmw.start:
+; GFX1100-NEXT: [[LOADED:%.*]] = phi i32 [ [[TMP4]], [[TMP0:%.*]] ], [ [[NEWLOADED:%.*]], [[ATOMICRMW_START]] ]
+; GFX1100-NEXT: [[TMP5:%.*]] = and i32 [[LOADED]], [[INV_MASK]]
+; GFX1100-NEXT: [[TMP6:%.*]] = or i32 [[TMP5]], [[VALOPERAND_SHIFTED]]
+; GFX1100-NEXT: [[TMP7:%.*]] = cmpxchg ptr addrspace(1) [[ALIGNEDADDR]], i32 [[LOADED]], i32 [[TMP6]] syncscope("agent") seq_cst seq_cst, align 4, !mmra [[META0]]
+; GFX1100-NEXT: [[SUCCESS:%.*]] = extractvalue { i32, i1 } [[TMP7]], 1
+; GFX1100-NEXT: [[NEWLOADED]] = extractvalue { i32, i1 } [[TMP7]], 0
+; GFX1100-NEXT: br i1 [[SUCCESS]], label [[ATOMICRMW_END:%.*]], label [[ATOMICRMW_START]]
+; GFX1100: atomicrmw.end:
+; GFX1100-NEXT: [[SHIFTED:%.*]] = lshr i32 [[NEWLOADED]], [[SHIFTAMT]]
+; GFX1100-NEXT: [[EXTRACTED:%.*]] = trunc i32 [[SHIFTED]] to i16
+; GFX1100-NEXT: ret i16 [[EXTRACTED]]
+;
+ %res = atomicrmw xchg ptr addrspace(1) %ptr, i16 %value syncscope("agent") seq_cst, !mmra !2
+ ret i16 %res
+}
+
+define i16 @test_cmpxchg_i16_global_agent_align4(ptr addrspace(1) %out, i16 %in, i16 %old) {
+; GFX90A-LABEL: define i16 @test_cmpxchg_i16_global_agent_align4(
+; GFX90A-SAME: ptr addrspace(1) [[OUT:%.*]], i16 [[IN:%.*]], i16 [[OLD:%.*]]) #[[ATTR0]] {
+; GFX90A-NEXT: [[GEP:%.*]] = getelementptr i16, ptr addrspace(1) [[OUT]], i64 4
+; GFX90A-NEXT: [[TMP1:%.*]] = zext i16 [[IN]] to i32
+; GFX90A-NEXT: [[TMP2:%.*]] = zext i16 [[OLD]] to i32
+; GFX90A-NEXT: [[TMP3:%.*]] = load i32, ptr addrspace(1) [[GEP]], align 4, !mmra [[META0]]
+; GFX90A-NEXT: [[TMP4:%.*]] = and i32 [[TMP3]], -65536
+; GFX90A-NEXT: br label [[PARTWORD_CMPXCHG_LOOP:%.*]]
+; GFX90A: partword.cmpxchg.loop:
+; GFX90A-NEXT: [[TMP5:%.*]] = phi i32 [ [[TMP4]], [[TMP0:%.*]] ], [ [[TMP11:%.*]], [[PARTWORD_CMPXCHG_FAILURE:%.*]] ]
+; GFX90A-NEXT: [[TMP6:%.*]] = or i32 [[TMP5]], [[TMP1]]
+; GFX90A-NEXT: [[TMP7:%.*]] = or i32 [[TMP5]], [[TMP2]]
+; GFX90A-NEXT: [[TMP8:%.*]] = cmpxchg ptr addrspace(1) [[GEP]], i32 [[TMP7]], i32 [[TMP6]] seq_cst seq_cst, align 4, !mmra [[META0]]
+; GFX90A-NEXT: [[TMP9:%.*]] = extractvalue { i32, i1 } [[TMP8]], 0
+; GFX90A-NEXT: [[TMP10:%.*]] = extractvalue { i32, i1 } [[TMP8]], 1
+; GFX90A-NEXT: br i1 [[TMP10]], label [[PARTWORD_CMPXCHG_END:%.*]], label [[PARTWORD_CMPXCHG_FAILURE]]
+; GFX90A: partword.cmpxchg.failure:
+; GFX90A-NEXT: [[TMP11]] = and i32 [[TMP9]], -65536
+; GFX90A-NEXT: [[TMP12:%.*]] = icmp ne i32 [[TMP5]], [[TMP11]]
+; GFX90A-NEXT: br i1 [[TMP12]], label [[PARTWORD_CMPXCHG_LOOP]], label [[PARTWORD_CMPXCHG_END]]
+; GFX90A: partword.cmpxchg.end:
+; GFX90A-NEXT: [[EXTRACTED:%.*]] = trunc i32 [[TMP9]] to i16
+; GFX90A-NEXT: [[TMP13:%.*]] = insertvalue { i16, i1 } poison, i16 [[EXTRACTED]], 0
+; GFX90A-NEXT: [[TMP14:%.*]] = insertvalue { i16, i1 } [[TMP13]], i1 [[TMP10]], 1
+; GFX90A-NEXT: [[EXTRACT:%.*]] = extractvalue { i16, i1 } [[TMP14]], 0
+; GFX90A-NEXT: ret i16 [[EXTRACT]]
+;
+; GFX1100-LABEL: define i16 @test_cmpxchg_i16_global_agent_align4(
+; GFX1100-SAME: ptr addrspace(1) [[OUT:%.*]], i16 [[IN:%.*]], i16 [[OLD:%.*]]) #[[ATTR0]] {
+; GFX1100-NEXT: [[GEP:%.*]] = getelementptr i16, ptr addrspace(1) [[OUT]], i64 4
+; GFX1100-NEXT: [[TMP1:%.*]] = zext i16 [[IN]] to i32
+; GFX1100-NEXT: [[TMP2:%.*]] = zext i16 [[OLD]] to i32
+; GFX1100-NEXT: [[TMP3:%.*]] = load i32, ptr addrspace(1) [[GEP]], align 4, !mmra [[META0]]
+; GFX1100-NEXT: [[TMP4:%.*]] = and i32 [[TMP3]], -65536
+; GFX1100-NEXT: br label [[PARTWORD_CMPXCHG_LOOP:%.*]]
+; GFX1100: partword.cmpxchg.loop:
+; GFX1100-NEXT: [[TMP5:%.*]] = phi i32 [ [[TMP4]], [[TMP0:%.*]] ], [ [[TMP11:%.*]], [[PARTWORD_CMPXCHG_FAILURE:%.*]] ]
+; GFX1100-NEXT: [[TMP6:%.*]] = or i32 [[TMP5]], [[TMP1]]
+; GFX1100-NEXT: [[TMP7:%.*]] = or i32 [[TMP5]], [[TMP2]]
+; GFX1100-NEXT: [[TMP8:%.*]] = cmpxchg ptr addrspace(1) [[GEP]], i32 [[TMP7]], i32 [[TMP6]] seq_cst seq_cst, align 4, !mmra [[META0]]
+; GFX1100-NEXT: [[TMP9:%.*]] = extractvalue { i32, i1 } [[TMP8]], 0
+; GFX1100-NEXT: [[TMP10:%.*]] = extractvalue { i32, i1 } [[TMP8]], 1
+; GFX1100-NEXT: br i1 [[TMP10]], label [[PARTWORD_CMPXCHG_END:%.*]], label [[PARTWORD_CMPXCHG_FAILURE]]
+; GFX1100: partword.cmpxchg.failure:
+; GFX1100-NEXT: [[TMP11]] = and i32 [[TMP9]], -65536
+; GFX1100-NEXT: [[TMP12:%.*]] = icmp ne i32 [[TMP5]], [[TMP11]]
+; GFX1100-NEXT: br i1 [[TMP12]], label [[PARTWORD_CMPXCHG_LOOP]], label [[PARTWORD_CMPXCHG_END]]
+; GFX1100: partword.cmpxchg.end:
+; GFX1100-NEXT: [[EXTRACTED:%.*]] = trunc i32 [[TMP9]] to i16
+; GFX1100-NEXT: [[TMP13:%.*]] = insertvalue { i16, i1 } poison, i16 [[EXTRACTED]], 0
+; GFX1100-NEXT: [[TMP14:%.*]] = insertvalue { i16, i1 } [[TMP13]], i1 [[TMP10]], 1
+; GFX1100-NEXT: [[EXTRACT:%.*]] = extractvalue { i16, i1 } [[TMP14]], 0
+; GFX1100-NEXT: ret i16 [[EXTRACT]]
+;
+ %gep = getelementptr i16, ptr addrspace(1) %out, i64 4
+ %res = cmpxchg ptr addrspace(1) %gep, i16 %old, i16 %in seq_cst seq_cst, align 4, !mmra !2
+ %extract = extractvalue {i16, i1} %res, 0
+ ret i16 %extract
+}
+
+define void @syncscope_workgroup_nortn(ptr %addr, float %val) #0 {
+; GFX90A-LABEL: define void @syncscope_workgroup_nortn(
+; GFX90A-SAME: ptr [[ADDR:%.*]], float [[VAL:%.*]]) #[[ATTR1:[0-9]+]] {
+; GFX90A-NEXT: br label [[ATOMICRMW_CHECK_SHARED:%.*]]
+; GFX90A: atomicrmw.check.shared:
+; GFX90A-NEXT: [[IS_SHARED:%.*]] = call i1 @llvm.amdgcn.is.shared(ptr [[ADDR]])
+; GFX90A-NEXT: br i1 [[IS_SHARED]], label [[ATOMICRMW_SHARED:%.*]], label [[ATOMICRMW_CHECK_PRIVATE:%.*]]
+; GFX90A: atomicrmw.shared:
+; GFX90A-NEXT: [[TMP1:%.*]] = addrspacecast ptr [[ADDR]] to ptr addrspace(3)
+; GFX90A-NEXT: [[TMP2:%.*]] = atomicrmw fadd ptr addrspace(3) [[TMP1]], float [[VAL]] syncscope("workgroup") seq_cst, align 4, !mmra [[META0]]
+; GFX90A-NEXT: br label [[ATOMICRMW_PHI:%.*]]
+; GFX90A: atomicrmw.check.private:
+; GFX90A-NEXT: [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[ADDR]])
+; GFX90A-NEXT: br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
+; GFX90A: atomicrmw.private:
+; GFX90A-NEXT: [[TMP3:%.*]] = addrspacecast ptr [[ADDR]] to ptr addrspace(5)
+; GFX90A-NEXT: [[LOADED_PRIVATE:%.*]] = load float, ptr addrspace(5) [[TMP3]], align 4
+; GFX90A-NEXT: [[VAL_NEW:%.*]] = fadd float [[LOADED_PRIVATE]], [[VAL]]
+; GFX90A-NEXT: store float [[VAL_NEW]], ptr addrspace(5) [[TMP3]], align 4
+; GFX90A-NEXT: br label [[ATOMICRMW_PHI]]
+; GFX90A: atomicrmw.global:
+; GFX90A-NEXT: [[TMP4:%.*]] = addrspacecast ptr [[ADDR]] to ptr addrspace(1)
+; GFX90A-NEXT: [[TMP5:%.*]] = atomicrmw fadd ptr addrspace(1) [[TMP4]], float [[VAL]] syncscope("workgroup") seq_cst, align 4, !mmra [[META0]]
+; GFX90A-NEXT: br label [[ATOMICRMW_PHI]]
+; GFX90A: atomicrmw.phi:
+; GFX90A-NEXT: [[LOADED_PHI:%.*]] = phi float [ [[TMP2]], [[ATOMICRMW_SHARED]] ], [ [[LOADED_PRIVATE]], [[ATOMICRMW_PRIVATE]] ], [ [[TMP5]], [[ATOMICRMW_GLOBAL]] ]
+; GFX90A-NEXT: br label [[ATOMICRMW_END:%.*]]
+; GFX90A: atomicrmw.end:
+; GFX90A-NEXT: ret void
+;
+; GFX1100-LABEL: define void @syncscope_workgroup_nortn(
+; GFX1100-SAME: ptr [[ADDR:%.*]], float [[VAL:%.*]]) #[[ATTR1:[0-9]+]] {
+; GFX1100-NEXT: [[RES:%.*]] = atomicrmw fadd ptr [[ADDR]], float [[VAL]] syncscope("workgroup") seq_cst, align 4, !mmra [[META0]]
+; GFX1100-NEXT: ret void
+;
+ %res = atomicrmw fadd ptr %addr, float %val syncscope("workgroup") seq_cst, !mmra !2
+ ret void
+}
+
+define i32 @atomic_load_global_align1(ptr addrspace(1) %ptr) {
+; GFX90A-LABEL: define i32 @atomic_load_global_align1(
+; GFX90A-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR0]] {
+; GFX90A-NEXT: [[TMP1:%.*]] = addrspacecast ptr addrspace(1) [[PTR]] to ptr
+; GFX90A-NEXT: [[TMP2:%.*]] = alloca i32, align 4, addrspace(5)
+; GFX90A-NEXT: call void @llvm.lifetime.start.p5(i64 4, ptr addrspace(5) [[TMP2]])
+; GFX90A-NEXT: call void @__atomic_load(i64 4, ptr [[TMP1]], ptr addrspace(5) [[TMP2]], i32 5)
+; GFX90A-NEXT: [[TMP3:%.*]] = load i32, ptr addrspace(5) [[TMP2]], align 4
+; GFX90A-NEXT: call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) [[TMP2]])
+; GFX90A-NEXT: ret i32 [[TMP3]]
+;
+; GFX1100-LABEL: define i32 @atomic_load_global_align1(
+; GFX1100-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR0]] {
+; GFX1100-NEXT: [[TMP1:%.*]] = addrspacecast ptr addrspace(1) [[PTR]] to ptr
+; GFX1100-NEXT: [[TMP2:%.*]] = alloca i32, align 4, addrspace(5)
+; GFX1100-NEXT: call void @llvm.lifetime.start.p5(i64 4, ptr addrspace(5) [[TMP2]])
+; GFX1100-NEXT: call void @__atomic_load(i64 4, ptr [[TMP1]], ptr addrspace(5) [[TMP2]], i32 5)
+; GFX1100-NEXT: [[TMP3:%.*]] = load i32, ptr addrspace(5) [[TMP2]], align 4
+; GFX1100-NEXT: call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) [[TMP2]])
+; GFX1100-NEXT: ret i32 [[TMP3]]
+;
+ %val = load atomic i32, ptr addrspace(1) %ptr seq_cst, align 1, !mmra !2
+ ret i32 %val
+}
+
+attributes #0 = { "amdgpu-unsafe-fp-atomics"="true" }
+
+!0 = !{!"foo", !"bar"}
+!1 = !{!"bux", !"baz"}
+!2 = !{!0, !1}
+;.
+; GFX90A: [[META0]] = !{[[META1:![0-9]+]], [[META2:![0-9]+]]}
+; GFX90A: [[META1]] = !{!"foo", !"bar"}
+; GFX90A: [[META2]] = !{!"bux", !"baz"}
+;.
+; GFX1100: [[META0]] = !{[[META1:![0-9]+]], [[META2:![0-9]+]]}
+; GFX1100: [[META1]] = !{!"foo", !"bar"}
+; GFX1100: [[META2]] = !{!"bux", !"baz"}
+;.
diff --git a/llvm/test/Transforms/SimplifyCFG/mmra.ll b/llvm/test/Transforms/SimplifyCFG/mmra.ll
new file mode 100644
index 00000000000000..6670657471376b
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/mmra.ll
@@ -0,0 +1,150 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -passes=simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S %s | FileCheck %s
+
+; RUN: opt -passes='simplifycfg<sink-common-insts;hoist-common-insts>,verify' -S %s | FileCheck %s
+
+declare void @clobber1()
+declare void @clobber2()
+
+define void @sink(ptr %arg, i1 %c) {
+; CHECK-LABEL: define void @sink(
+; CHECK-SAME: ptr [[ARG:%.*]], i1 [[C:%.*]]) {
+; CHECK-NEXT: bb:
+; CHECK-NEXT: br i1 [[C]], label [[THEN:%.*]], label [[ELSE:%.*]]
+; CHECK: then:
+; CHECK-NEXT: call void @clobber1()
+; CHECK-NEXT: store ptr null, ptr [[ARG]], align 8
+; CHECK-NEXT: br label [[EXIT:%.*]]
+; CHECK: else:
+; CHECK-NEXT: call void @clobber2()
+; CHECK-NEXT: store ptr null, ptr [[ARG]], align 8, !mmra [[META0:![0-9]+]]
+; CHECK-NEXT: br label [[EXIT]]
+; CHECK: exit:
+; CHECK-NEXT: ret void
+;
+bb:
+ br i1 %c, label %then, label %else
+
+then:
+ call void @clobber1()
+ store ptr null, ptr %arg, align 8
+ br label %exit
+
+else:
+ call void @clobber2()
+ store ptr null, ptr %arg, align 8, !mmra !0
+ br label %exit
+
+exit:
+ ret void
+}
+
+define void @hoist_store(ptr %arg, i1 %c) {
+; CHECK-LABEL: define void @hoist_store(
+; CHECK-SAME: ptr [[ARG:%.*]], i1 [[C:%.*]]) {
+; CHECK-NEXT: bb:
+; CHECK-NEXT: br i1 [[C]], label [[THEN:%.*]], label [[ELSE:%.*]]
+; CHECK: then:
+; CHECK-NEXT: store ptr null, ptr [[ARG]], align 8
+; CHECK-NEXT: call void @clobber1()
+; CHECK-NEXT: br label [[EXIT:%.*]]
+; CHECK: else:
+; CHECK-NEXT: store ptr null, ptr [[ARG]], align 8, !mmra [[META0]]
+; CHECK-NEXT: call void @clobber2()
+; CHECK-NEXT: br label [[EXIT]]
+; CHECK: exit:
+; CHECK-NEXT: ret void
+;
+bb:
+ br i1 %c, label %then, label %else
+
+then:
+ store ptr null, ptr %arg, align 8
+ call void @clobber1()
+ br label %exit
+
+else:
+ store ptr null, ptr %arg, align 8, !mmra !0
+ call void @clobber2()
+ br label %exit
+
+exit:
+ ret void
+}
+
+define ptr @sink_load(ptr %arg, i1 %c) {
+; CHECK-LABEL: define ptr @sink_load(
+; CHECK-SAME: ptr [[ARG:%.*]], i1 [[C:%.*]]) {
+; CHECK-NEXT: bb:
+; CHECK-NEXT: br i1 [[C]], label [[THEN:%.*]], label [[ELSE:%.*]]
+; CHECK: then:
+; CHECK-NEXT: call void @clobber1()
+; CHECK-NEXT: [[L1:%.*]] = load ptr, ptr [[ARG]], align 8
+; CHECK-NEXT: br label [[EXIT:%.*]]
+; CHECK: else:
+; CHECK-NEXT: call void @clobber2()
+; CHECK-NEXT: [[L2:%.*]] = load ptr, ptr [[ARG]], align 8, !mmra [[META0]]
+; CHECK-NEXT: br label [[EXIT]]
+; CHECK: exit:
+; CHECK-NEXT: [[P:%.*]] = phi ptr [ [[L1]], [[THEN]] ], [ [[L2]], [[ELSE]] ]
+; CHECK-NEXT: ret ptr [[P]]
+;
+bb:
+ br i1 %c, label %then, label %else
+
+then:
+ call void @clobber1()
+ %l1 = load ptr, ptr %arg, align 8
+ br label %exit
+
+else:
+ call void @clobber2()
+ %l2 = load ptr, ptr %arg, align 8, !mmra !0
+ br label %exit
+
+exit:
+ %p = phi ptr [ %l1, %then ], [ %l2, %else ]
+ ret ptr %p
+}
+
+define ptr @hoist_load(ptr %arg, i1 %c) {
+; CHECK-LABEL: define ptr @hoist_load(
+; CHECK-SAME: ptr [[ARG:%.*]], i1 [[C:%.*]]) {
+; CHECK-NEXT: bb:
+; CHECK-NEXT: br i1 [[C]], label [[THEN:%.*]], label [[ELSE:%.*]]
+; CHECK: then:
+; CHECK-NEXT: [[L1:%.*]] = load ptr, ptr [[ARG]], align 8
+; CHECK-NEXT: call void @clobber1()
+; CHECK-NEXT: br label [[EXIT:%.*]]
+; CHECK: else:
+; CHECK-NEXT: [[L2:%.*]] = load ptr, ptr [[ARG]], align 8, !mmra [[META0]]
+; CHECK-NEXT: call void @clobber2()
+; CHECK-NEXT: br label [[EXIT]]
+; CHECK: exit:
+; CHECK-NEXT: [[P:%.*]] = phi ptr [ [[L1]], [[THEN]] ], [ [[L2]], [[ELSE]] ]
+; CHECK-NEXT: ret ptr [[P]]
+;
+bb:
+ br i1 %c, label %then, label %else
+
+then:
+ %l1 = load ptr, ptr %arg, align 8
+ call void @clobber1()
+ br label %exit
+
+else:
+ %l2 = load ptr, ptr %arg, align 8, !mmra !0
+ call void @clobber2()
+ br label %exit
+
+exit:
+ %p = phi ptr [ %l1, %then ], [ %l2, %else ]
+ ret ptr %p
+}
+
+
+!0 = !{!"foo", !"bar"}
+
+;.
+; CHECK: [[META0]] = !{!"foo", !"bar"}
+;.
diff --git a/llvm/test/Verifier/mmra-allowed.ll b/llvm/test/Verifier/mmra-allowed.ll
new file mode 100644
index 00000000000000..76dff3f207cdf3
--- /dev/null
+++ b/llvm/test/Verifier/mmra-allowed.ll
@@ -0,0 +1,31 @@
+; RUN: opt -S -passes=verify < %s
+
+; This file contains MMRA metadata that is okay and should pass the verifier.
+
+define void @test(ptr %ptr) {
+ %ld = load i8, ptr %ptr, !mmra !0
+ store i8 1, ptr %ptr, !mmra !1
+ call void @writesMem(), !mmra !2
+ call void @readsMem(), !mmra !2
+ fence release, !mmra !0
+ %rmw.1 = atomicrmw add ptr %ptr, i8 0 release, !mmra !0
+ %rmw.2 = atomicrmw add ptr %ptr, i8 0 acquire, !mmra !0
+ %pair = cmpxchg ptr %ptr, i8 0, i8 1 acquire acquire, !mmra !1
+ %ld.atomic = load atomic i8, ptr %ptr acquire, align 4, !mmra !1
+ store atomic i8 1, ptr %ptr release, align 4, !mmra !2
+ %mld = call <2 x i64> @llvm.vp.load.v2i64.p0(ptr undef, <2 x i1> undef, i32 undef), !mmra !2
+ ; TODO: barrier
+ ret void
+}
+
+declare <2 x i64> @llvm.vp.load.v2i64.p0(ptr, <2 x i1>, i32)
+
+declare void @readsMem(ptr) #0
+declare void @writesMem(ptr) #1
+
+attributes #0 = { memory(read) }
+attributes #1 = { memory(write) }
+
+!0 = !{!"scope", !"workgroup"}
+!1 = !{!"as", !"private"}
+!2 = !{!0, !1}
diff --git a/llvm/test/Verifier/mmra.ll b/llvm/test/Verifier/mmra.ll
new file mode 100644
index 00000000000000..b506d593a1c433
--- /dev/null
+++ b/llvm/test/Verifier/mmra.ll
@@ -0,0 +1,43 @@
+; RUN: not opt -S -passes=verify < %s 2>&1 | FileCheck %s
+
+define void @foo(ptr %ptr, i32 %x) {
+
+ ; CHECK: !mmra metadata attached to unexpected instruction kind
+ ; CHECK-NEXT: %bad.add
+ %bad.add = add i32 %x, 42, !mmra !{}
+
+ ; CHECK: !mmra metadata attached to unexpected instruction kind
+ ; CHECK-NEXT: %bad.sub
+ %bad.sub = sub i32 %x, 42, !mmra !{}
+
+ ; CHECK: !mmra metadata attached to unexpected instruction kind
+ ; CHECK-NEXT: %bad.sqrt
+ %bad.sqrt = call float @llvm.sqrt.f32(float undef), !mmra !{}
+
+ ; CHECK: !mmra expected to be a metadata tuple
+ ; CHECK-NEXT: %bad.md0
+ ; CHECK-NEXT: !DIFile
+ %bad.md0 = load atomic i32, ptr %ptr acquire, align 4, !mmra !0
+
+ ; CHECK: !mmra expected to be a metadata tuple
+ ; CHECK-NEXT: %bad.md1
+ ; CHECK-NEXT: !DIFile
+ %bad.md1 = load atomic i32, ptr %ptr acquire, align 4, !mmra !0
+
+ ; CHECK: !mmra metadata tuple operand is not an MMRA tag
+ ; CHECK-NEXT: %bad.md2
+ ; CHECK-NEXT: !"foo"
+ %bad.md2 = load atomic i32, ptr %ptr acquire, align 4, !mmra !1
+
+ ; CHECK: !mmra metadata tuple operand is not an MMRA tag
+ ; CHECK-NEXT: %bad.md3
+ ; CHECK-NEXT: !"baz"
+ %bad.md3 = load atomic i32, ptr %ptr acquire, align 4, !mmra !2
+ ret void
+}
+
+declare float @llvm.sqrt.f32(float)
+
+!0 = !DIFile(filename: "test.c", directory: "")
+!1 = !{!"foo", !"bar", !"bux"}
+!2 = !{!"baz", !0}
diff --git a/llvm/unittests/CodeGen/MachineInstrTest.cpp b/llvm/unittests/CodeGen/MachineInstrTest.cpp
index 49da0c38eefdca..c579ffc514c9b7 100644
--- a/llvm/unittests/CodeGen/MachineInstrTest.cpp
+++ b/llvm/unittests/CodeGen/MachineInstrTest.cpp
@@ -18,6 +18,7 @@
#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/IR/ModuleSlotTracker.h"
#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCSymbol.h"
@@ -277,12 +278,14 @@ TEST(MachineInstrExtraInfo, AddExtraInfo) {
MCSymbol *Sym2 = MC->createTempSymbol("post_label", false);
MDNode *HAM = MDNode::getDistinct(Ctx, std::nullopt);
MDNode *PCS = MDNode::getDistinct(Ctx, std::nullopt);
+ MDNode *MMRA = MMRAMetadata().addTag("foo", "bar").getAsMD(Ctx);
ASSERT_TRUE(MI->memoperands_empty());
ASSERT_FALSE(MI->getPreInstrSymbol());
ASSERT_FALSE(MI->getPostInstrSymbol());
ASSERT_FALSE(MI->getHeapAllocMarker());
ASSERT_FALSE(MI->getPCSections());
+ ASSERT_FALSE(MI->getMMRAMetadata());
MI->setMemRefs(*MF, MMOs);
ASSERT_TRUE(MI->memoperands().size() == 1);
@@ -290,6 +293,7 @@ TEST(MachineInstrExtraInfo, AddExtraInfo) {
ASSERT_FALSE(MI->getPostInstrSymbol());
ASSERT_FALSE(MI->getHeapAllocMarker());
ASSERT_FALSE(MI->getPCSections());
+ ASSERT_FALSE(MI->getMMRAMetadata());
MI->setPreInstrSymbol(*MF, Sym1);
ASSERT_TRUE(MI->memoperands().size() == 1);
@@ -297,6 +301,7 @@ TEST(MachineInstrExtraInfo, AddExtraInfo) {
ASSERT_FALSE(MI->getPostInstrSymbol());
ASSERT_FALSE(MI->getHeapAllocMarker());
ASSERT_FALSE(MI->getPCSections());
+ ASSERT_FALSE(MI->getMMRAMetadata());
MI->setPostInstrSymbol(*MF, Sym2);
ASSERT_TRUE(MI->memoperands().size() == 1);
@@ -304,6 +309,7 @@ TEST(MachineInstrExtraInfo, AddExtraInfo) {
ASSERT_TRUE(MI->getPostInstrSymbol() == Sym2);
ASSERT_FALSE(MI->getHeapAllocMarker());
ASSERT_FALSE(MI->getPCSections());
+ ASSERT_FALSE(MI->getMMRAMetadata());
MI->setHeapAllocMarker(*MF, HAM);
ASSERT_TRUE(MI->memoperands().size() == 1);
@@ -311,6 +317,7 @@ TEST(MachineInstrExtraInfo, AddExtraInfo) {
ASSERT_TRUE(MI->getPostInstrSymbol() == Sym2);
ASSERT_TRUE(MI->getHeapAllocMarker() == HAM);
ASSERT_FALSE(MI->getPCSections());
+ ASSERT_FALSE(MI->getMMRAMetadata());
MI->setPCSections(*MF, PCS);
ASSERT_TRUE(MI->memoperands().size() == 1);
@@ -318,6 +325,21 @@ TEST(MachineInstrExtraInfo, AddExtraInfo) {
ASSERT_TRUE(MI->getPostInstrSymbol() == Sym2);
ASSERT_TRUE(MI->getHeapAllocMarker() == HAM);
ASSERT_TRUE(MI->getPCSections() == PCS);
+ ASSERT_FALSE(MI->getMMRAMetadata());
+
+ MI->setMMRAMetadata(*MF, MMRA);
+ ASSERT_TRUE(MI->memoperands().size() == 1);
+ ASSERT_TRUE(MI->getPreInstrSymbol() == Sym1);
+ ASSERT_TRUE(MI->getPostInstrSymbol() == Sym2);
+ ASSERT_TRUE(MI->getHeapAllocMarker() == HAM);
+ ASSERT_TRUE(MI->getPCSections() == PCS);
+ ASSERT_TRUE(MI->getMMRAMetadata() == MMRA);
+
+ // Check with nothing but MMRAs.
+ MachineInstr *MMRAMI = MF->CreateMachineInstr(MCID, DebugLoc());
+ ASSERT_FALSE(MMRAMI->getMMRAMetadata());
+ MMRAMI->setMMRAMetadata(*MF, MMRA);
+ ASSERT_TRUE(MMRAMI->getMMRAMetadata() == MMRA);
}
TEST(MachineInstrExtraInfo, ChangeExtraInfo) {
@@ -338,11 +360,15 @@ TEST(MachineInstrExtraInfo, ChangeExtraInfo) {
MDNode *HAM = MDNode::getDistinct(Ctx, std::nullopt);
MDNode *PCS = MDNode::getDistinct(Ctx, std::nullopt);
+ MDNode *MMRA1 = MMRAMetadata().addTag("foo", "bar").getAsMD(Ctx);
+ MDNode *MMRA2 = MMRAMetadata().addTag("bar", "bux").getAsMD(Ctx);
+
MI->setMemRefs(*MF, MMOs);
MI->setPreInstrSymbol(*MF, Sym1);
MI->setPostInstrSymbol(*MF, Sym2);
MI->setHeapAllocMarker(*MF, HAM);
MI->setPCSections(*MF, PCS);
+ MI->setMMRAMetadata(*MF, MMRA1);
MMOs.push_back(MMO);
@@ -352,6 +378,7 @@ TEST(MachineInstrExtraInfo, ChangeExtraInfo) {
ASSERT_TRUE(MI->getPostInstrSymbol() == Sym2);
ASSERT_TRUE(MI->getHeapAllocMarker() == HAM);
ASSERT_TRUE(MI->getPCSections() == PCS);
+ ASSERT_TRUE(MI->getMMRAMetadata() == MMRA1);
MI->setPostInstrSymbol(*MF, Sym1);
ASSERT_TRUE(MI->memoperands().size() == 2);
@@ -359,6 +386,15 @@ TEST(MachineInstrExtraInfo, ChangeExtraInfo) {
ASSERT_TRUE(MI->getPostInstrSymbol() == Sym1);
ASSERT_TRUE(MI->getHeapAllocMarker() == HAM);
ASSERT_TRUE(MI->getPCSections() == PCS);
+ ASSERT_TRUE(MI->getMMRAMetadata() == MMRA1);
+
+ MI->setMMRAMetadata(*MF, MMRA2);
+ ASSERT_TRUE(MI->memoperands().size() == 2);
+ ASSERT_TRUE(MI->getPreInstrSymbol() == Sym1);
+ ASSERT_TRUE(MI->getPostInstrSymbol() == Sym1);
+ ASSERT_TRUE(MI->getHeapAllocMarker() == HAM);
+ ASSERT_TRUE(MI->getPCSections() == PCS);
+ ASSERT_TRUE(MI->getMMRAMetadata() == MMRA2);
}
TEST(MachineInstrExtraInfo, RemoveExtraInfo) {
@@ -380,11 +416,14 @@ TEST(MachineInstrExtraInfo, RemoveExtraInfo) {
MDNode *HAM = MDNode::getDistinct(Ctx, std::nullopt);
MDNode *PCS = MDNode::getDistinct(Ctx, std::nullopt);
+ MDNode *MMRA = MMRAMetadata().getAsMD(Ctx);
+
MI->setMemRefs(*MF, MMOs);
MI->setPreInstrSymbol(*MF, Sym1);
MI->setPostInstrSymbol(*MF, Sym2);
MI->setHeapAllocMarker(*MF, HAM);
MI->setPCSections(*MF, PCS);
+ MI->setMMRAMetadata(*MF, MMRA);
MI->setPostInstrSymbol(*MF, nullptr);
ASSERT_TRUE(MI->memoperands().size() == 2);
@@ -392,6 +431,7 @@ TEST(MachineInstrExtraInfo, RemoveExtraInfo) {
ASSERT_FALSE(MI->getPostInstrSymbol());
ASSERT_TRUE(MI->getHeapAllocMarker() == HAM);
ASSERT_TRUE(MI->getPCSections() == PCS);
+ ASSERT_TRUE(MI->getMMRAMetadata() == MMRA);
MI->setHeapAllocMarker(*MF, nullptr);
ASSERT_TRUE(MI->memoperands().size() == 2);
@@ -399,6 +439,7 @@ TEST(MachineInstrExtraInfo, RemoveExtraInfo) {
ASSERT_FALSE(MI->getPostInstrSymbol());
ASSERT_FALSE(MI->getHeapAllocMarker());
ASSERT_TRUE(MI->getPCSections() == PCS);
+ ASSERT_TRUE(MI->getMMRAMetadata() == MMRA);
MI->setPCSections(*MF, nullptr);
ASSERT_TRUE(MI->memoperands().size() == 2);
@@ -406,6 +447,7 @@ TEST(MachineInstrExtraInfo, RemoveExtraInfo) {
ASSERT_FALSE(MI->getPostInstrSymbol());
ASSERT_FALSE(MI->getHeapAllocMarker());
ASSERT_FALSE(MI->getPCSections());
+ ASSERT_TRUE(MI->getMMRAMetadata() == MMRA);
MI->setPreInstrSymbol(*MF, nullptr);
ASSERT_TRUE(MI->memoperands().size() == 2);
@@ -413,6 +455,7 @@ TEST(MachineInstrExtraInfo, RemoveExtraInfo) {
ASSERT_FALSE(MI->getPostInstrSymbol());
ASSERT_FALSE(MI->getHeapAllocMarker());
ASSERT_FALSE(MI->getPCSections());
+ ASSERT_TRUE(MI->getMMRAMetadata() == MMRA);
MI->setMemRefs(*MF, {});
ASSERT_TRUE(MI->memoperands_empty());
@@ -420,6 +463,15 @@ TEST(MachineInstrExtraInfo, RemoveExtraInfo) {
ASSERT_FALSE(MI->getPostInstrSymbol());
ASSERT_FALSE(MI->getHeapAllocMarker());
ASSERT_FALSE(MI->getPCSections());
+ ASSERT_TRUE(MI->getMMRAMetadata() == MMRA);
+
+ MI->setMMRAMetadata(*MF, nullptr);
+ ASSERT_TRUE(MI->memoperands_empty());
+ ASSERT_FALSE(MI->getPreInstrSymbol());
+ ASSERT_FALSE(MI->getPostInstrSymbol());
+ ASSERT_FALSE(MI->getHeapAllocMarker());
+ ASSERT_FALSE(MI->getPCSections());
+ ASSERT_FALSE(MI->getMMRAMetadata());
}
TEST(MachineInstrDebugValue, AddDebugValueOperand) {
diff --git a/llvm/unittests/IR/CMakeLists.txt b/llvm/unittests/IR/CMakeLists.txt
index 803164b8f1eac6..a03b0711ba33f0 100644
--- a/llvm/unittests/IR/CMakeLists.txt
+++ b/llvm/unittests/IR/CMakeLists.txt
@@ -31,6 +31,7 @@ add_llvm_unittest(IRTests
IntrinsicsTest.cpp
LegacyPassManagerTest.cpp
MDBuilderTest.cpp
+ MemoryModelRelaxationAnnotationsTest.cpp
ManglerTest.cpp
MetadataTest.cpp
ModuleTest.cpp
diff --git a/llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp b/llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp
new file mode 100644
index 00000000000000..801b7567a1268d
--- /dev/null
+++ b/llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp
@@ -0,0 +1,285 @@
+//===- llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp ---------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/IR/Metadata.h"
+#include "llvm/IR/Module.h"
+#include "gtest/gtest.h"
+
+using namespace llvm;
+
+namespace {
+
+TEST(MMRATest, Order) {
+ MMRAMetadata A, B;
+
+ std::array<MMRAMetadata::TagT, 5> Tags{{{"opencl-fence-mem", "local"},
+ {"opencl-fence-mem", "global"},
+ {"foo", "0"},
+ {"foo", "2"},
+ {"foo", "4"}}};
+
+ // Test that ordering does not matter.
+ for (unsigned K : {0, 2, 3, 1, 4})
+ A.addTag(Tags[K]);
+ for (unsigned K : {2, 3, 0, 4, 1})
+ B.addTag(Tags[K]);
+
+ EXPECT_EQ(A, B);
+}
+
+TEST(MMRATest, MDParse) {
+ LLVMContext Ctx;
+
+ // No nesting:
+ // !{!"foo", "!bar"}
+ MDNode *FooBar =
+ MDTuple::get(Ctx, {MDString::get(Ctx, "foo"), MDString::get(Ctx, "bar")});
+ MMRAMetadata FooBarMMRA(FooBar);
+
+ EXPECT_EQ(FooBarMMRA.size(), 1u);
+ EXPECT_EQ(FooBarMMRA, MMRAMetadata().addTag("foo", "bar"));
+
+ // Nested:
+ // !{!{!"foo", "!bar"}, !{!"bux", !"qux"}}
+ MDNode *BuxQux =
+ MDTuple::get(Ctx, {MDString::get(Ctx, "bux"), MDString::get(Ctx, "qux")});
+ MDNode *Nested = MDTuple::get(Ctx, {FooBar, BuxQux});
+ MMRAMetadata NestedMMRA(Nested);
+
+ EXPECT_EQ(NestedMMRA.size(), 2u);
+ EXPECT_EQ(NestedMMRA,
+ MMRAMetadata().addTag("foo", "bar").addTag("bux", "qux"));
+}
+
+TEST(MMRATest, MDEmit) {
+ LLVMContext Ctx;
+
+ // Simple MD.
+ // !{!"foo", "!bar"}
+ {
+ MMRAMetadata FooBarMMRA = MMRAMetadata().addTag("foo", "bar");
+ MDTuple *FooBar = FooBarMMRA.getAsMD(Ctx);
+
+ ASSERT_NE(FooBar, nullptr);
+ ASSERT_EQ(FooBar->getNumOperands(), 2u);
+ MDString *Foo = dyn_cast<MDString>(FooBar->getOperand(0));
+ MDString *Bar = dyn_cast<MDString>(FooBar->getOperand(1));
+ ASSERT_NE(Foo, nullptr);
+ ASSERT_NE(Bar, nullptr);
+ EXPECT_EQ(Foo->getString(), std::string("foo"));
+ EXPECT_EQ(Bar->getString(), std::string("bar"));
+ }
+
+ // Nested MD
+ // !{!{!"foo", "!bar"}, !{!"bux", !"qux"}}
+ {
+ MMRAMetadata NestedMMRA =
+ MMRAMetadata().addTag("foo", "bar").addTag("bux", "qux");
+ MDTuple *Nested = NestedMMRA.getAsMD(Ctx);
+
+ ASSERT_NE(Nested, nullptr);
+ ASSERT_EQ(Nested->getNumOperands(), 2u);
+ MDTuple *BuxQux = dyn_cast<MDTuple>(Nested->getOperand(0));
+ MDTuple *FooBar = dyn_cast<MDTuple>(Nested->getOperand(1));
+ ASSERT_NE(FooBar, nullptr);
+ ASSERT_NE(BuxQux, nullptr);
+ ASSERT_EQ(FooBar->getNumOperands(), 2u);
+ ASSERT_EQ(BuxQux->getNumOperands(), 2u);
+
+ MDString *Foo = dyn_cast<MDString>(FooBar->getOperand(0));
+ MDString *Bar = dyn_cast<MDString>(FooBar->getOperand(1));
+ MDString *Bux = dyn_cast<MDString>(BuxQux->getOperand(0));
+ MDString *Qux = dyn_cast<MDString>(BuxQux->getOperand(1));
+
+ EXPECT_EQ(Foo->getString(), std::string("foo"));
+ EXPECT_EQ(Bar->getString(), std::string("bar"));
+ EXPECT_EQ(Bux->getString(), std::string("bux"));
+ EXPECT_EQ(Qux->getString(), std::string("qux"));
+ }
+}
+
+TEST(MMRATest, Utility) {
+ using TagT = MMRAMetadata::TagT;
+
+ MMRAMetadata MMRA;
+ MMRA.addTag("foo", "0");
+ MMRA.addTag("foo", "1");
+ MMRA.addTag("bar", "x");
+
+ EXPECT_TRUE(MMRA.hasTagWithPrefix("foo"));
+ EXPECT_TRUE(MMRA.hasTagWithPrefix("bar"));
+ EXPECT_FALSE(MMRA.hasTagWithPrefix("x"));
+
+ EXPECT_TRUE(MMRA.hasTag("foo", "0"));
+ EXPECT_TRUE(MMRA.hasTag(TagT("foo", "0")));
+ EXPECT_TRUE(MMRA.hasTag("foo", "1"));
+ EXPECT_TRUE(MMRA.hasTag(TagT("foo", "1")));
+ EXPECT_TRUE(MMRA.hasTag("bar", "x"));
+ EXPECT_TRUE(MMRA.hasTag(TagT("bar", "x")));
+
+ auto AllFoo = MMRA.getAllTagsWithPrefix("foo");
+ auto AllBar = MMRA.getAllTagsWithPrefix("bar");
+ auto AllX = MMRA.getAllTagsWithPrefix("x");
+
+ EXPECT_EQ(AllFoo.size(), 2u);
+ EXPECT_EQ(AllBar.size(), 1u);
+ EXPECT_EQ(AllX.size(), 0u);
+
+ EXPECT_TRUE(is_contained(AllFoo, TagT("foo", "0")));
+ EXPECT_TRUE(is_contained(AllFoo, TagT("foo", "1")));
+ EXPECT_TRUE(is_contained(AllBar, TagT("bar", "x")));
+}
+
+TEST(MMRATest, Operators) {
+ MMRAMetadata A;
+ A.addTag("foo", "0");
+ A.addTag("bar", "x");
+
+ MMRAMetadata B;
+ B.addTag("foo", "0");
+ B.addTag("bar", "y");
+
+ // ensure we have different objects by creating copies.
+ EXPECT_EQ(MMRAMetadata(A), MMRAMetadata(A));
+ EXPECT_TRUE((bool)A);
+
+ EXPECT_EQ(MMRAMetadata(B), MMRAMetadata(B));
+ EXPECT_TRUE((bool)B);
+
+ EXPECT_NE(A, B);
+
+ EXPECT_EQ(MMRAMetadata(), MMRAMetadata());
+ EXPECT_NE(A, MMRAMetadata());
+ EXPECT_NE(B, MMRAMetadata());
+
+ MMRAMetadata Empty;
+ EXPECT_FALSE((bool)Empty);
+}
+
+TEST(MMRATest, Compatibility) {
+ MMRAMetadata Foo0;
+ Foo0.addTag("foo", "0");
+
+ MMRAMetadata Foo1;
+ Foo1.addTag("foo", "1");
+
+ MMRAMetadata Foo10;
+ Foo10.addTag("foo", "0");
+ Foo10.addTag("foo", "1");
+
+ MMRAMetadata Bar;
+ Bar.addTag("bar", "y");
+
+ MMRAMetadata Empty;
+
+ // Other set has no tag with same prefix
+ EXPECT_TRUE(Foo0.isCompatibleWith(Bar));
+ EXPECT_TRUE(Bar.isCompatibleWith(Foo0));
+
+ EXPECT_TRUE(Foo0.isCompatibleWith(Empty));
+ EXPECT_TRUE(Empty.isCompatibleWith(Foo0));
+
+ EXPECT_TRUE(Empty.isCompatibleWith(MMRAMetadata()));
+ EXPECT_TRUE(MMRAMetadata().isCompatibleWith(Empty));
+
+ // Other set has conflicting tags.
+ EXPECT_FALSE(Foo1.isCompatibleWith(Foo0));
+ EXPECT_FALSE(Foo0.isCompatibleWith(Foo1));
+
+ // Both have common tags.
+ EXPECT_TRUE(Foo0.isCompatibleWith(Foo0));
+ EXPECT_TRUE(Foo0.isCompatibleWith(Foo10));
+ EXPECT_TRUE(Foo10.isCompatibleWith(Foo0));
+
+ EXPECT_TRUE(Foo1.isCompatibleWith(Foo1));
+ EXPECT_TRUE(Foo1.isCompatibleWith(Foo10));
+ EXPECT_TRUE(Foo10.isCompatibleWith(Foo1));
+
+ // Try with more prefixes now:
+ MMRAMetadata Multiple0;
+ Multiple0.addTag("foo", "y");
+ Multiple0.addTag("foo", "x");
+ Multiple0.addTag("bar", "z");
+
+ MMRAMetadata Multiple1;
+ Multiple1.addTag("foo", "z");
+ Multiple1.addTag("foo", "x");
+ Multiple1.addTag("bar", "y");
+
+ MMRAMetadata Multiple2;
+ Multiple2.addTag("foo", "z");
+ Multiple2.addTag("foo", "x");
+ Multiple2.addTag("bux", "y");
+
+ // Multiple0 and Multiple1 are not compatible because "bar" is getting in the
+ // way.
+ EXPECT_FALSE(Multiple0.isCompatibleWith(Multiple1));
+ EXPECT_FALSE(Multiple1.isCompatibleWith(Multiple0));
+
+ EXPECT_TRUE(Multiple0.isCompatibleWith(Empty));
+ EXPECT_TRUE(Empty.isCompatibleWith(Multiple0));
+ EXPECT_TRUE(Multiple1.isCompatibleWith(Empty));
+ EXPECT_TRUE(Empty.isCompatibleWith(Multiple1));
+
+ // Multiple2 is compatible with both 1/0 because there is always "foo:x" in
+ // common, and the other prefixes are unique to each set.
+ EXPECT_TRUE(Multiple2.isCompatibleWith(Multiple0));
+ EXPECT_TRUE(Multiple0.isCompatibleWith(Multiple2));
+ EXPECT_TRUE(Multiple2.isCompatibleWith(Multiple1));
+ EXPECT_TRUE(Multiple1.isCompatibleWith(Multiple2));
+}
+
+TEST(MMRATest, Combine) {
+ MMRAMetadata Foo0;
+ Foo0.addTag("foo", "0");
+
+ MMRAMetadata Foo10;
+ Foo10.addTag("foo", "0");
+ Foo10.addTag("foo", "1");
+
+ MMRAMetadata Bar0;
+ Bar0.addTag("bar", "0");
+
+ MMRAMetadata BarFoo0;
+ BarFoo0.addTag("bar", "0");
+ BarFoo0.addTag("foo", "0");
+
+ {
+ // foo is common to both sets
+ MMRAMetadata Combined = Foo0.combine(Foo10);
+ EXPECT_EQ(Combined, Foo10);
+ }
+
+ {
+ // nothing is common
+ MMRAMetadata Combined = Foo0.combine(Bar0);
+ EXPECT_TRUE(Combined.empty());
+ }
+
+ {
+ // only foo is common.
+ MMRAMetadata Combined = BarFoo0.combine(Foo0);
+ EXPECT_EQ(Combined, Foo0);
+ }
+
+ {
+ // only bar is common.
+ MMRAMetadata Combined = BarFoo0.combine(Bar0);
+ EXPECT_EQ(Combined, Bar0);
+ }
+
+ {
+ // only foo is common
+ MMRAMetadata Combined = BarFoo0.combine(Foo10);
+ EXPECT_EQ(Combined, Foo10);
+ }
+}
+
+} // namespace
>From 5a4e1b41618d24facb9afe63a9276072806e58e8 Mon Sep 17 00:00:00 2001
From: pvanhout <pierre.vanhoutryve at amd.com>
Date: Tue, 2 Apr 2024 10:07:14 +0200
Subject: [PATCH 2/4] Refactor API, Address comments
---
llvm/docs/ReleaseNotes.rst | 4 +-
.../IR/MemoryModelRelaxationAnnotations.h | 69 ++++---
llvm/lib/Analysis/VectorUtils.cpp | 4 +-
llvm/lib/CodeGen/AtomicExpandPass.cpp | 3 +-
.../IR/MemoryModelRelaxationAnnotations.cpp | 108 ++++-------
llvm/lib/Transforms/Utils/Local.cpp | 8 +-
llvm/unittests/CodeGen/MachineInstrTest.cpp | 8 +-
.../MemoryModelRelaxationAnnotationsTest.cpp | 178 +++++-------------
8 files changed, 138 insertions(+), 244 deletions(-)
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index 7588048334d792..cbf0760aeb24b6 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -50,6 +50,8 @@ Update on required toolchains to build LLVM
Changes to the LLVM IR
----------------------
+- Added Memory Model Relaxation Annotations (MMRAs).
+
Changes to LLVM infrastructure
------------------------------
@@ -133,7 +135,7 @@ Changes to the C API
functions for accessing the values in a blockaddress constant.
* Added ``LLVMConstStringInContext2`` function, which better matches the C++
- API by using ``size_t`` for string length. Deprecated ``LLVMConstStringInContext``.
+ API by using ``size_t`` for string length. Deprecated ``LLVMConstStringInContext``.
* Added the following functions for accessing a function's prefix data:
diff --git a/llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h b/llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h
index 88824dd59518b5..64b60728bc3e5d 100644
--- a/llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h
+++ b/llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h
@@ -5,34 +5,33 @@
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
+//
/// \file
/// This file provides utility for Memory Model Relaxation Annotations (MMRAs).
/// Those annotations are represented using Metadata. The MMRATagSet class
/// offers a simple API to parse the metadata and perform common operations on
/// it. The MMRAMetadata class is a simple tuple of MDNode that provides easy
/// access to all MMRA annotations on an instruction.
-///
+//
//===----------------------------------------------------------------------===//
#ifndef LLVM_IR_MEMORYMODELRELAXATIONANNOTATIONS_H
#define LLVM_IR_MEMORYMODELRELAXATIONANNOTATIONS_H
-#include <set>
-#include <string>
-#include <vector>
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/StringRef.h"
+#include <tuple> // for std::pair
namespace llvm {
class MDNode;
class MDTuple;
class Metadata;
-class StringRef;
class raw_ostream;
class LLVMContext;
class Instruction;
-/// Helper class for `!mmra` metadata nodes which can both build MMRA MDNodes,
-/// and parse them.
+/// Helper class to manipulate `!mmra` metadata nodes.
///
/// This can be visualized as a set of "tags", with each tag
/// representing a particular property of an instruction, as
@@ -44,49 +43,66 @@ class Instruction;
/// are a prefix/suffix pair of strings.
class MMRAMetadata {
public:
- using TagT = std::pair<std::string, std::string>;
- using SetT = std::set<TagT>;
+ using TagT = std::pair<StringRef, StringRef>;
+ using SetT = DenseSet<TagT>;
using const_iterator = SetT::const_iterator;
+ /// \name Constructors
+ /// @{
MMRAMetadata() = default;
MMRAMetadata(const Instruction &I);
MMRAMetadata(MDNode *MD);
+ /// @}
+
+ /// \name Metadata Helpers & Builders
+ /// @{
+
+ /// Combines \p A and \p B according to MMRA semantics.
+ /// \returns !mmra metadata for the combined MMRAs.
+ static MDNode *combine(LLVMContext &Ctx, const MMRAMetadata &A,
+ const MMRAMetadata &B);
+
+ /// Creates !mmra metadata for a single tag.
+ ///
+ /// !mmra metadata can either be a single tag, or a MDTuple containing
+ /// multiple tags.
+ static MDTuple *getTagMD(LLVMContext &Ctx, StringRef Prefix,
+ StringRef Suffix);
+ static MDTuple *getTagMD(LLVMContext &Ctx, const TagT &T) {
+ return getTagMD(Ctx, T.first, T.second);
+ }
+
+ /// \returns true if \p MD is a well-formed MMRA tag.
+ static bool isTagMD(const Metadata *MD);
+
+ /// @}
+
+ /// \name Compatibility Helpers
+ /// @{
/// \returns whether the MMRAs on \p A and \p B are compatible.
static bool checkCompatibility(const Instruction &A, const Instruction &B) {
return MMRAMetadata(A).isCompatibleWith(B);
}
- /// \returns true if \p MD is a well-formed MMRA tag.
- static bool isTagMD(const Metadata *MD);
-
/// \returns whether this set of tags is compatible with \p Other.
bool isCompatibleWith(const MMRAMetadata &Other) const;
- /// Combines this set of tags with \p Other.
- /// \returns a new set of tags containing the result.
- MMRAMetadata combine(const MMRAMetadata &Other) const;
+ /// @}
- MMRAMetadata &addTag(StringRef Prefix, StringRef Suffix);
- MMRAMetadata &addTag(const TagT &Tag) {
- Tags.insert(Tag);
- return *this;
- }
+ /// \name Content Queries
+ /// @{
bool hasTag(StringRef Prefix, StringRef Suffix) const;
- bool hasTag(const TagT &Tag) const { return Tags.count(Tag); }
-
bool hasTagWithPrefix(StringRef Prefix) const;
- std::vector<TagT> getAllTagsWithPrefix(StringRef Prefix) const;
-
- MDTuple *getAsMD(LLVMContext &Ctx) const;
-
const_iterator begin() const;
const_iterator end() const;
bool empty() const;
unsigned size() const;
+ /// @}
+
void print(raw_ostream &OS) const;
void dump() const;
@@ -102,6 +118,7 @@ class MMRAMetadata {
SetT Tags;
};
+/// \returns true if \p I can have !mmra metadata.
bool canInstructionHaveMMRAs(const Instruction &I);
} // namespace llvm
diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index a86be5608ae7b3..917094267d05ae 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -802,9 +802,7 @@ Instruction *llvm::propagateMetadata(Instruction *Inst, ArrayRef<Value *> VL) {
switch (Kind) {
case LLVMContext::MD_mmra: {
- auto Tags = MMRAMetadata(dyn_cast_or_null<MDTuple>(MD));
- auto ITags = MMRAMetadata(dyn_cast_or_null<MDTuple>(IMD));
- MD = Tags.combine(ITags).getAsMD(Inst->getContext());
+ MD = MMRAMetadata::combine(Inst->getContext(), MD, IMD);
break;
}
case LLVMContext::MD_tbaa:
diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index 730877b7abb8da..5dac6739bd548f 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -141,7 +141,8 @@ struct ReplacementIRBuilder : IRBuilder<InstSimplifyFolder> {
SetInsertPoint(I);
if (BB->getParent()->getAttributes().hasFnAttr(Attribute::StrictFP))
this->setIsFPConstrained(true);
- this->CollectMetadataToCopy(I, {LLVMContext::MD_pcsections, LLVMContext::MD_mmra});
+ this->CollectMetadataToCopy(
+ I, {LLVMContext::MD_pcsections, LLVMContext::MD_mmra});
}
};
diff --git a/llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp b/llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp
index 4d9b24c005f455..d2d2deb207bc8e 100644
--- a/llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp
+++ b/llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp
@@ -8,20 +8,14 @@
#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/ADT/StringSet.h"
+#include "llvm/IR/Instructions.h"
#include "llvm/IR/Metadata.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"
-// FIXME: Only needed for canInstructionHaveMMRAs, should it move to another
-// file?
-#include "llvm/IR/Instructions.h"
-
using namespace llvm;
-static bool isTagMD(const MDNode *MD) {
- return isa<MDTuple>(MD) && MD->getNumOperands() == 2 &&
- isa<MDString>(MD->getOperand(0)) && isa<MDString>(MD->getOperand(1));
-}
+//===- MMRAMetadata -------------------------------------------------------===//
MMRAMetadata::MMRAMetadata(const Instruction &I)
: MMRAMetadata(I.getMetadata(LLVMContext::MD_mmra)) {}
@@ -37,8 +31,8 @@ MMRAMetadata::MMRAMetadata(MDNode *MD) {
assert(Tuple && "Invalid MMRA structure");
const auto HandleTagMD = [this](MDNode *TagMD) {
- addTag(cast<MDString>(TagMD->getOperand(0))->getString(),
- cast<MDString>(TagMD->getOperand(1))->getString());
+ Tags.insert({cast<MDString>(TagMD->getOperand(0))->getString(),
+ cast<MDString>(TagMD->getOperand(1))->getString()});
};
if (isTagMD(Tuple)) {
@@ -62,27 +56,14 @@ bool MMRAMetadata::isTagMD(const Metadata *MD) {
return false;
}
-bool MMRAMetadata::isCompatibleWith(const MMRAMetadata &Other) const {
- // Two sets of tags are compatible iff, for every unique tag prefix P
- // present in at least one set:
- // - the other set contains no tag with prefix P, or
- // - at least one tag with prefix P is common to both sets.
-
- StringMap<bool> PrefixStatuses;
- for (const auto &[P, S] : Tags)
- PrefixStatuses[P] |= (Other.hasTag(P, S) || !Other.hasTagWithPrefix(P));
- for (const auto &[P, S] : Other)
- PrefixStatuses[P] |= (hasTag(P, S) || !hasTagWithPrefix(P));
-
- for (auto &[Prefix, Status] : PrefixStatuses) {
- if (!Status)
- return false;
- }
-
- return true;
+MDTuple *MMRAMetadata::getTagMD(LLVMContext &Ctx, StringRef Prefix,
+ StringRef Suffix) {
+ return MDTuple::get(Ctx,
+ {MDString::get(Ctx, Prefix), MDString::get(Ctx, Suffix)});
}
-MMRAMetadata MMRAMetadata::combine(const MMRAMetadata &Other) const {
+MDNode *MMRAMetadata::combine(LLVMContext &Ctx, const MMRAMetadata &A,
+ const MMRAMetadata &B) {
// Let A and B be two tags set, and U be the prefix-wise union of A and B.
// For every unique tag prefix P present in A or B:
// * If either A or B has no tags with prefix P, no tags with prefix
@@ -90,36 +71,42 @@ MMRAMetadata MMRAMetadata::combine(const MMRAMetadata &Other) const {
// * If both A and B have at least one tag with prefix P, all tags with prefix
// P from both sets are added to U.
- MMRAMetadata U;
- for (const auto &[P, S] : Tags) {
- if (Other.hasTagWithPrefix(P))
- U.addTag(P, S);
+ SmallVector<Metadata *> Result;
+
+ for (const auto &[P, S] : A) {
+ if (B.hasTagWithPrefix(P))
+ Result.push_back(getTagMD(Ctx, P, S));
}
- for (const auto &[P, S] : Other.Tags) {
- if (hasTagWithPrefix(P))
- U.addTag(P, S);
+ for (const auto &[P, S] : B) {
+ if (A.hasTagWithPrefix(P))
+ Result.push_back(getTagMD(Ctx, P, S));
}
- return U;
-}
-
-MMRAMetadata &MMRAMetadata::addTag(StringRef Prefix, StringRef Suffix) {
- Tags.insert(std::make_pair(Prefix.str(), Suffix.str()));
- return *this;
+ return MDTuple::get(Ctx, Result);
}
bool MMRAMetadata::hasTag(StringRef Prefix, StringRef Suffix) const {
- return Tags.count(std::make_pair(Prefix.str(), Suffix.str()));
+ return Tags.count({Prefix, Suffix});
}
-std::vector<MMRAMetadata::TagT>
-MMRAMetadata::getAllTagsWithPrefix(StringRef Prefix) const {
- std::vector<TagT> Result;
- for (const auto &T : Tags) {
- if (T.first == Prefix)
- Result.push_back(T);
+bool MMRAMetadata::isCompatibleWith(const MMRAMetadata &Other) const {
+ // Two sets of tags are compatible iff, for every unique tag prefix P
+ // present in at least one set:
+ // - the other set contains no tag with prefix P, or
+ // - at least one tag with prefix P is common to both sets.
+
+ StringMap<bool> PrefixStatuses;
+ for (const auto &[P, S] : Tags)
+ PrefixStatuses[P] |= (Other.hasTag(P, S) || !Other.hasTagWithPrefix(P));
+ for (const auto &[P, S] : Other)
+ PrefixStatuses[P] |= (hasTag(P, S) || !hasTagWithPrefix(P));
+
+ for (auto &[Prefix, Status] : PrefixStatuses) {
+ if (!Status)
+ return false;
}
- return Result;
+
+ return true;
}
bool MMRAMetadata::hasTagWithPrefix(StringRef Prefix) const {
@@ -139,25 +126,6 @@ bool MMRAMetadata::empty() const { return Tags.empty(); }
unsigned MMRAMetadata::size() const { return Tags.size(); }
-static MDTuple *getMDForPair(LLVMContext &Ctx, StringRef P, StringRef S) {
- return MDTuple::get(Ctx, {MDString::get(Ctx, P), MDString::get(Ctx, S)});
-}
-
-MDTuple *MMRAMetadata::getAsMD(LLVMContext &Ctx) const {
- if (empty())
- return MDTuple::get(Ctx, {});
-
- std::vector<Metadata *> TagMDs;
- TagMDs.reserve(Tags.size());
-
- for (const auto &[P, S] : Tags)
- TagMDs.push_back(getMDForPair(Ctx, P, S));
-
- if (TagMDs.size() == 1)
- return cast<MDTuple>(TagMDs.front());
- return MDTuple::get(Ctx, TagMDs);
-}
-
void MMRAMetadata::print(raw_ostream &OS) const {
bool IsFirst = true;
// TODO: use map_iter + join
@@ -173,6 +141,8 @@ void MMRAMetadata::print(raw_ostream &OS) const {
LLVM_DUMP_METHOD
void MMRAMetadata::dump() const { print(dbgs()); }
+//===- Helpers ------------------------------------------------------------===//
+
static bool isReadWriteMemCall(const Instruction &I) {
if (const auto *C = dyn_cast<CallBase>(&I))
return C->mayReadOrWriteMemory() ||
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index 85759e15b4ccde..489ffdb57cc6c4 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -3330,11 +3330,11 @@ void llvm::combineMetadata(Instruction *K, const Instruction *J,
// Merge MMRAs.
// This is handled separately because we also want to handle cases where K
// doesn't have tags but J does.
- auto JTags = MMRAMetadata(J->getMetadata(LLVMContext::MD_mmra));
- auto KTags = MMRAMetadata(K->getMetadata(LLVMContext::MD_mmra));
- if (JTags || KTags) {
+ auto JMMRA = J->getMetadata(LLVMContext::MD_mmra);
+ auto KMMRA = K->getMetadata(LLVMContext::MD_mmra);
+ if (JMMRA || KMMRA) {
K->setMetadata(LLVMContext::MD_mmra,
- JTags.combine(KTags).getAsMD(K->getContext()));
+ MMRAMetadata::combine(K->getContext(), JMMRA, KMMRA));
}
}
diff --git a/llvm/unittests/CodeGen/MachineInstrTest.cpp b/llvm/unittests/CodeGen/MachineInstrTest.cpp
index c579ffc514c9b7..8ea12a6ec6453c 100644
--- a/llvm/unittests/CodeGen/MachineInstrTest.cpp
+++ b/llvm/unittests/CodeGen/MachineInstrTest.cpp
@@ -278,7 +278,7 @@ TEST(MachineInstrExtraInfo, AddExtraInfo) {
MCSymbol *Sym2 = MC->createTempSymbol("post_label", false);
MDNode *HAM = MDNode::getDistinct(Ctx, std::nullopt);
MDNode *PCS = MDNode::getDistinct(Ctx, std::nullopt);
- MDNode *MMRA = MMRAMetadata().addTag("foo", "bar").getAsMD(Ctx);
+ MDNode *MMRA = MMRAMetadata::getTagMD(Ctx, "foo", "bar");
ASSERT_TRUE(MI->memoperands_empty());
ASSERT_FALSE(MI->getPreInstrSymbol());
@@ -360,8 +360,8 @@ TEST(MachineInstrExtraInfo, ChangeExtraInfo) {
MDNode *HAM = MDNode::getDistinct(Ctx, std::nullopt);
MDNode *PCS = MDNode::getDistinct(Ctx, std::nullopt);
- MDNode *MMRA1 = MMRAMetadata().addTag("foo", "bar").getAsMD(Ctx);
- MDNode *MMRA2 = MMRAMetadata().addTag("bar", "bux").getAsMD(Ctx);
+ MDNode *MMRA1 = MMRAMetadata::getTagMD(Ctx, "foo", "bar");
+ MDNode *MMRA2 = MMRAMetadata::getTagMD(Ctx, "bar", "bux");
MI->setMemRefs(*MF, MMOs);
MI->setPreInstrSymbol(*MF, Sym1);
@@ -416,7 +416,7 @@ TEST(MachineInstrExtraInfo, RemoveExtraInfo) {
MDNode *HAM = MDNode::getDistinct(Ctx, std::nullopt);
MDNode *PCS = MDNode::getDistinct(Ctx, std::nullopt);
- MDNode *MMRA = MMRAMetadata().getAsMD(Ctx);
+ MDNode *MMRA = MDTuple::get(Ctx, {});
MI->setMemRefs(*MF, MMOs);
MI->setPreInstrSymbol(*MF, Sym1);
diff --git a/llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp b/llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp
index 801b7567a1268d..30f80ff10b10a5 100644
--- a/llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp
+++ b/llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp
@@ -16,22 +16,19 @@ using namespace llvm;
namespace {
-TEST(MMRATest, Order) {
- MMRAMetadata A, B;
-
- std::array<MMRAMetadata::TagT, 5> Tags{{{"opencl-fence-mem", "local"},
- {"opencl-fence-mem", "global"},
- {"foo", "0"},
- {"foo", "2"},
- {"foo", "4"}}};
-
- // Test that ordering does not matter.
- for (unsigned K : {0, 2, 3, 1, 4})
- A.addTag(Tags[K]);
- for (unsigned K : {2, 3, 0, 4, 1})
- B.addTag(Tags[K]);
-
- EXPECT_EQ(A, B);
+void checkMMRA(const MMRAMetadata &MMRA,
+ ArrayRef<MMRAMetadata::TagT> Expected) {
+ EXPECT_EQ(MMRA.size(), Expected.size());
+ for (const auto &E : Expected)
+ EXPECT_TRUE(MMRA.hasTag(E.first, E.second));
+}
+
+MMRAMetadata createFromMD(LLVMContext &Ctx,
+ ArrayRef<MMRAMetadata::TagT> Expected) {
+ SmallVector<Metadata *> MD;
+ for (const auto &Tag : Expected)
+ MD.push_back(MMRAMetadata::getTagMD(Ctx, Tag));
+ return MDTuple::get(Ctx, MD);
}
TEST(MMRATest, MDParse) {
@@ -43,8 +40,7 @@ TEST(MMRATest, MDParse) {
MDTuple::get(Ctx, {MDString::get(Ctx, "foo"), MDString::get(Ctx, "bar")});
MMRAMetadata FooBarMMRA(FooBar);
- EXPECT_EQ(FooBarMMRA.size(), 1u);
- EXPECT_EQ(FooBarMMRA, MMRAMetadata().addTag("foo", "bar"));
+ checkMMRA(FooBarMMRA, {{"foo", "bar"}});
// Nested:
// !{!{!"foo", "!bar"}, !{!"bux", !"qux"}}
@@ -53,98 +49,28 @@ TEST(MMRATest, MDParse) {
MDNode *Nested = MDTuple::get(Ctx, {FooBar, BuxQux});
MMRAMetadata NestedMMRA(Nested);
- EXPECT_EQ(NestedMMRA.size(), 2u);
- EXPECT_EQ(NestedMMRA,
- MMRAMetadata().addTag("foo", "bar").addTag("bux", "qux"));
-}
-
-TEST(MMRATest, MDEmit) {
- LLVMContext Ctx;
-
- // Simple MD.
- // !{!"foo", "!bar"}
- {
- MMRAMetadata FooBarMMRA = MMRAMetadata().addTag("foo", "bar");
- MDTuple *FooBar = FooBarMMRA.getAsMD(Ctx);
-
- ASSERT_NE(FooBar, nullptr);
- ASSERT_EQ(FooBar->getNumOperands(), 2u);
- MDString *Foo = dyn_cast<MDString>(FooBar->getOperand(0));
- MDString *Bar = dyn_cast<MDString>(FooBar->getOperand(1));
- ASSERT_NE(Foo, nullptr);
- ASSERT_NE(Bar, nullptr);
- EXPECT_EQ(Foo->getString(), std::string("foo"));
- EXPECT_EQ(Bar->getString(), std::string("bar"));
- }
-
- // Nested MD
- // !{!{!"foo", "!bar"}, !{!"bux", !"qux"}}
- {
- MMRAMetadata NestedMMRA =
- MMRAMetadata().addTag("foo", "bar").addTag("bux", "qux");
- MDTuple *Nested = NestedMMRA.getAsMD(Ctx);
-
- ASSERT_NE(Nested, nullptr);
- ASSERT_EQ(Nested->getNumOperands(), 2u);
- MDTuple *BuxQux = dyn_cast<MDTuple>(Nested->getOperand(0));
- MDTuple *FooBar = dyn_cast<MDTuple>(Nested->getOperand(1));
- ASSERT_NE(FooBar, nullptr);
- ASSERT_NE(BuxQux, nullptr);
- ASSERT_EQ(FooBar->getNumOperands(), 2u);
- ASSERT_EQ(BuxQux->getNumOperands(), 2u);
-
- MDString *Foo = dyn_cast<MDString>(FooBar->getOperand(0));
- MDString *Bar = dyn_cast<MDString>(FooBar->getOperand(1));
- MDString *Bux = dyn_cast<MDString>(BuxQux->getOperand(0));
- MDString *Qux = dyn_cast<MDString>(BuxQux->getOperand(1));
-
- EXPECT_EQ(Foo->getString(), std::string("foo"));
- EXPECT_EQ(Bar->getString(), std::string("bar"));
- EXPECT_EQ(Bux->getString(), std::string("bux"));
- EXPECT_EQ(Qux->getString(), std::string("qux"));
- }
+ checkMMRA(NestedMMRA, {{"foo", "bar"}, {"bux", "qux"}});
}
TEST(MMRATest, Utility) {
- using TagT = MMRAMetadata::TagT;
-
- MMRAMetadata MMRA;
- MMRA.addTag("foo", "0");
- MMRA.addTag("foo", "1");
- MMRA.addTag("bar", "x");
+ LLVMContext Ctx;
+ MMRAMetadata MMRA =
+ createFromMD(Ctx, {{"foo", "0"}, {"foo", "1"}, {"bar", "x"}});
EXPECT_TRUE(MMRA.hasTagWithPrefix("foo"));
EXPECT_TRUE(MMRA.hasTagWithPrefix("bar"));
EXPECT_FALSE(MMRA.hasTagWithPrefix("x"));
EXPECT_TRUE(MMRA.hasTag("foo", "0"));
- EXPECT_TRUE(MMRA.hasTag(TagT("foo", "0")));
EXPECT_TRUE(MMRA.hasTag("foo", "1"));
- EXPECT_TRUE(MMRA.hasTag(TagT("foo", "1")));
EXPECT_TRUE(MMRA.hasTag("bar", "x"));
- EXPECT_TRUE(MMRA.hasTag(TagT("bar", "x")));
-
- auto AllFoo = MMRA.getAllTagsWithPrefix("foo");
- auto AllBar = MMRA.getAllTagsWithPrefix("bar");
- auto AllX = MMRA.getAllTagsWithPrefix("x");
-
- EXPECT_EQ(AllFoo.size(), 2u);
- EXPECT_EQ(AllBar.size(), 1u);
- EXPECT_EQ(AllX.size(), 0u);
-
- EXPECT_TRUE(is_contained(AllFoo, TagT("foo", "0")));
- EXPECT_TRUE(is_contained(AllFoo, TagT("foo", "1")));
- EXPECT_TRUE(is_contained(AllBar, TagT("bar", "x")));
}
TEST(MMRATest, Operators) {
- MMRAMetadata A;
- A.addTag("foo", "0");
- A.addTag("bar", "x");
+ LLVMContext Ctx;
- MMRAMetadata B;
- B.addTag("foo", "0");
- B.addTag("bar", "y");
+ MMRAMetadata A = createFromMD(Ctx, {{"foo", "0"}, {"bar", "x"}});
+ MMRAMetadata B = createFromMD(Ctx, {{"foo", "0"}, {"bar", "y"}});
// ensure we have different objects by creating copies.
EXPECT_EQ(MMRAMetadata(A), MMRAMetadata(A));
@@ -164,18 +90,13 @@ TEST(MMRATest, Operators) {
}
TEST(MMRATest, Compatibility) {
- MMRAMetadata Foo0;
- Foo0.addTag("foo", "0");
-
- MMRAMetadata Foo1;
- Foo1.addTag("foo", "1");
+ LLVMContext Ctx;
- MMRAMetadata Foo10;
- Foo10.addTag("foo", "0");
- Foo10.addTag("foo", "1");
+ MMRAMetadata Foo0 = createFromMD(Ctx, {{"foo", "0"}});
+ MMRAMetadata Foo1 = createFromMD(Ctx, {{"foo", "1"}});
+ MMRAMetadata Foo10 = createFromMD(Ctx, {{"foo", "0"}, {"foo", "1"}});
- MMRAMetadata Bar;
- Bar.addTag("bar", "y");
+ MMRAMetadata Bar = createFromMD(Ctx, {{"bar", "y"}});
MMRAMetadata Empty;
@@ -203,20 +124,12 @@ TEST(MMRATest, Compatibility) {
EXPECT_TRUE(Foo10.isCompatibleWith(Foo1));
// Try with more prefixes now:
- MMRAMetadata Multiple0;
- Multiple0.addTag("foo", "y");
- Multiple0.addTag("foo", "x");
- Multiple0.addTag("bar", "z");
-
- MMRAMetadata Multiple1;
- Multiple1.addTag("foo", "z");
- Multiple1.addTag("foo", "x");
- Multiple1.addTag("bar", "y");
-
- MMRAMetadata Multiple2;
- Multiple2.addTag("foo", "z");
- Multiple2.addTag("foo", "x");
- Multiple2.addTag("bux", "y");
+ MMRAMetadata Multiple0 =
+ createFromMD(Ctx, {{"foo", "y"}, {"foo", "x"}, {"bar", "z"}});
+ MMRAMetadata Multiple1 =
+ createFromMD(Ctx, {{"foo", "z"}, {"foo", "x"}, {"bar", "y"}});
+ MMRAMetadata Multiple2 =
+ createFromMD(Ctx, {{"foo", "z"}, {"foo", "x"}, {"bux", "y"}});
// Multiple0 and Multiple1 are not compatible because "bar" is getting in the
// way.
@@ -237,47 +150,40 @@ TEST(MMRATest, Compatibility) {
}
TEST(MMRATest, Combine) {
- MMRAMetadata Foo0;
- Foo0.addTag("foo", "0");
-
- MMRAMetadata Foo10;
- Foo10.addTag("foo", "0");
- Foo10.addTag("foo", "1");
-
- MMRAMetadata Bar0;
- Bar0.addTag("bar", "0");
+ LLVMContext Ctx;
- MMRAMetadata BarFoo0;
- BarFoo0.addTag("bar", "0");
- BarFoo0.addTag("foo", "0");
+ MMRAMetadata Foo0 = createFromMD(Ctx, {{"foo", "0"}});
+ MMRAMetadata Foo10 = createFromMD(Ctx, {{"foo", "0"}, {"foo", "1"}});
+ MMRAMetadata Bar0 = createFromMD(Ctx, {{"bar", "0"}});
+ MMRAMetadata BarFoo0 = createFromMD(Ctx, {{"bar", "0"}, {"foo", "0"}});
{
// foo is common to both sets
- MMRAMetadata Combined = Foo0.combine(Foo10);
+ MMRAMetadata Combined = MMRAMetadata::combine(Ctx, Foo0, Foo10);
EXPECT_EQ(Combined, Foo10);
}
{
// nothing is common
- MMRAMetadata Combined = Foo0.combine(Bar0);
+ MMRAMetadata Combined = MMRAMetadata::combine(Ctx, Foo0, Bar0);
EXPECT_TRUE(Combined.empty());
}
{
// only foo is common.
- MMRAMetadata Combined = BarFoo0.combine(Foo0);
+ MMRAMetadata Combined = MMRAMetadata::combine(Ctx, BarFoo0, Foo0);
EXPECT_EQ(Combined, Foo0);
}
{
// only bar is common.
- MMRAMetadata Combined = BarFoo0.combine(Bar0);
+ MMRAMetadata Combined = MMRAMetadata::combine(Ctx, BarFoo0, Bar0);
EXPECT_EQ(Combined, Bar0);
}
{
// only foo is common
- MMRAMetadata Combined = BarFoo0.combine(Foo10);
+ MMRAMetadata Combined = MMRAMetadata::combine(Ctx, BarFoo0, Foo10);
EXPECT_EQ(Combined, Foo10);
}
}
>From 43d76ba056b7e003b2c81a9a742f227f4a76234c Mon Sep 17 00:00:00 2001
From: pvanhout <pierre.vanhoutryve at amd.com>
Date: Tue, 2 Apr 2024 10:50:55 +0200
Subject: [PATCH 3/4] Add getMD API
It's useful for clang codegen.
---
.../IR/MemoryModelRelaxationAnnotations.h | 7 +++++++
.../IR/MemoryModelRelaxationAnnotations.cpp | 13 ++++++++++++
.../MemoryModelRelaxationAnnotationsTest.cpp | 21 +++++++++++++++++++
3 files changed, 41 insertions(+)
diff --git a/llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h b/llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h
index 64b60728bc3e5d..d7e35191796765 100644
--- a/llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h
+++ b/llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h
@@ -24,6 +24,9 @@
namespace llvm {
+template<typename T>
+class ArrayRef;
+
class MDNode;
class MDTuple;
class Metadata;
@@ -72,6 +75,10 @@ class MMRAMetadata {
return getTagMD(Ctx, T.first, T.second);
}
+ /// Creates !mmra metadata from \p Tags.
+ /// \returns nullptr or a MDTuple* from \p Tags.
+ static MDTuple *getMD(LLVMContext &Ctx, ArrayRef<TagT> Tags);
+
/// \returns true if \p MD is a well-formed MMRA tag.
static bool isTagMD(const Metadata *MD);
diff --git a/llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp b/llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp
index d2d2deb207bc8e..a9e14d81a01417 100644
--- a/llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp
+++ b/llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp
@@ -62,6 +62,19 @@ MDTuple *MMRAMetadata::getTagMD(LLVMContext &Ctx, StringRef Prefix,
{MDString::get(Ctx, Prefix), MDString::get(Ctx, Suffix)});
}
+MDTuple *MMRAMetadata::getMD(LLVMContext &Ctx, ArrayRef<MMRAMetadata::TagT> Tags) {
+ if (Tags.empty())
+ return nullptr;
+
+ if (Tags.size() == 1)
+ return getTagMD(Ctx, Tags.front());
+
+ SmallVector<Metadata*> MMRAs;
+ for(const auto &Tag: Tags)
+ MMRAs.push_back(getTagMD(Ctx, Tag));
+ return MDTuple::get(Ctx, MMRAs);
+}
+
MDNode *MMRAMetadata::combine(LLVMContext &Ctx, const MMRAMetadata &A,
const MMRAMetadata &B) {
// Let A and B be two tags set, and U be the prefix-wise union of A and B.
diff --git a/llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp b/llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp
index 30f80ff10b10a5..13e52909aead4a 100644
--- a/llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp
+++ b/llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp
@@ -52,6 +52,27 @@ TEST(MMRATest, MDParse) {
checkMMRA(NestedMMRA, {{"foo", "bar"}, {"bux", "qux"}});
}
+TEST(MMRATest, GetMD) {
+ LLVMContext Ctx;
+
+ EXPECT_EQ(MMRAMetadata::getMD(Ctx, {}), nullptr);
+
+ MDTuple* SingleMD = MMRAMetadata::getMD(Ctx, {{"foo", "bar"}});
+ EXPECT_EQ(SingleMD->getNumOperands(), 2);
+ EXPECT_EQ(cast<MDString>(SingleMD->getOperand(0))->getString(), "foo");
+ EXPECT_EQ(cast<MDString>(SingleMD->getOperand(1))->getString(), "bar");
+
+ MDTuple* MultiMD = MMRAMetadata::getMD(Ctx, {{"foo", "bar"}, {"bux", "qux"}});
+ EXPECT_EQ(MultiMD->getNumOperands(), 2);
+
+ MDTuple* FooBar = cast<MDTuple>(MultiMD->getOperand(0));
+ EXPECT_EQ(cast<MDString>(FooBar->getOperand(0))->getString(), "foo");
+ EXPECT_EQ(cast<MDString>(FooBar->getOperand(1))->getString(), "bar");
+ MDTuple* BuxQux = cast<MDTuple>(MultiMD->getOperand(1));
+ EXPECT_EQ(cast<MDString>(BuxQux->getOperand(0))->getString(), "bux");
+ EXPECT_EQ(cast<MDString>(BuxQux->getOperand(1))->getString(), "qux");
+}
+
TEST(MMRATest, Utility) {
LLVMContext Ctx;
MMRAMetadata MMRA =
>From 36bcf92bf7187b07a3842d3077b9f28c4444556e Mon Sep 17 00:00:00 2001
From: pvanhout <pierre.vanhoutryve at amd.com>
Date: Wed, 17 Jan 2024 11:26:57 +0100
Subject: [PATCH 4/4] [RFC][AMDGPU] Add OpenCL-specific fence address space
masks
Using MMRAs, implement `builtin_amdgcn_fence_opencl` to allow
device libs to emit fences that only target one or more address spaces, instead of fencing all address spaces at once.
---
clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 +
clang/lib/CodeGen/CGBuiltin.cpp | 28 +
clang/lib/CodeGen/CodeGenFunction.h | 2 +
clang/lib/Sema/SemaChecking.cpp | 7 +-
.../builtin-amdgcn-fence-opencl.cpp | 108 ++
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | 32 +-
.../memory-legalizer-fence-mmra-global.ll | 1428 +++++++++++++++++
.../memory-legalizer-fence-mmra-local.ll | 1188 ++++++++++++++
.../memory-legalizer-fence-mmra-private.ll | 1188 ++++++++++++++
9 files changed, 3973 insertions(+), 9 deletions(-)
create mode 100644 clang/test/CodeGenCXX/builtin-amdgcn-fence-opencl.cpp
create mode 100644 llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-global.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-local.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-private.ll
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index c660582cc98e66..f26c3c8d58df38 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -68,6 +68,7 @@ BUILTIN(__builtin_amdgcn_iglp_opt, "vIi", "n")
BUILTIN(__builtin_amdgcn_s_dcache_inv, "v", "n")
BUILTIN(__builtin_amdgcn_buffer_wbinvl1, "v", "n")
BUILTIN(__builtin_amdgcn_fence, "vUicC*", "n")
+BUILTIN(__builtin_amdgcn_fence_opencl, "vUiUicC*", "n")
BUILTIN(__builtin_amdgcn_groupstaticsize, "Ui", "n")
BUILTIN(__builtin_amdgcn_wavefrontsize, "Ui", "nc")
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index bb007231c0b783..22677e07c15ad2 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -56,6 +56,7 @@
#include "llvm/IR/IntrinsicsX86.h"
#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/MatrixBuilder.h"
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/Support/ConvertUTF.h"
#include "llvm/Support/MathExtras.h"
#include "llvm/Support/ScopedPrinter.h"
@@ -18319,6 +18320,26 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID,
return nullptr;
}
+void CodeGenFunction::AddAMDGCNAddressSpaceMMRA(llvm::Instruction *Inst,
+ llvm::Value *ASMask) {
+ constexpr const char *Tag = "opencl-fence-mem";
+
+ uint64_t Mask = cast<llvm::ConstantInt>(ASMask)->getZExtValue();
+ if (Mask == 0)
+ return;
+
+ // 3 bits can be set: local, global, image in that order.
+ LLVMContext &Ctx = Inst->getContext();
+ SmallVector<MMRAMetadata::TagT, 3> MMRAs;
+ if (Mask & (1 << 0))
+ MMRAs.push_back({Tag, "local"});
+ if (Mask & (1 << 1))
+ MMRAs.push_back({Tag, "global"});
+ if (Mask & (1 << 2))
+ MMRAs.push_back({Tag, "image"});
+ Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
+}
+
Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
const CallExpr *E) {
llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
@@ -18991,6 +19012,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
EmitScalarExpr(E->getArg(1)), AO, SSID);
return Builder.CreateFence(AO, SSID);
}
+ case AMDGPU::BI__builtin_amdgcn_fence_opencl: {
+ ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(1)),
+ EmitScalarExpr(E->getArg(2)), AO, SSID);
+ FenceInst *Fence = Builder.CreateFence(AO, SSID);
+ AddAMDGCNAddressSpaceMMRA(Fence, EmitScalarExpr(E->getArg(0)));
+ return Fence;
+ }
case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
case AMDGPU::BI__builtin_amdgcn_atomic_inc64:
case AMDGPU::BI__builtin_amdgcn_atomic_dec32:
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index e2a7e28c8211ea..e52a20a2b89bda 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -4542,6 +4542,8 @@ class CodeGenFunction : public CodeGenTypeCache {
llvm::Value *EmitHexagonBuiltinExpr(unsigned BuiltinID, const CallExpr *E);
llvm::Value *EmitRISCVBuiltinExpr(unsigned BuiltinID, const CallExpr *E,
ReturnValueSlot ReturnValue);
+
+ void AddAMDGCNAddressSpaceMMRA(llvm::Instruction *Inst, llvm::Value *ASMask);
void ProcessOrderScopeAMDGCN(llvm::Value *Order, llvm::Value *Scope,
llvm::AtomicOrdering &AO,
llvm::SyncScope::ID &SSID);
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 11401b6f56c0ea..f4fa9bdfe71d29 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -5681,6 +5681,10 @@ bool Sema::CheckAMDGCNBuiltinFunctionCall(unsigned BuiltinID,
OrderIndex = 0;
ScopeIndex = 1;
break;
+ case AMDGPU::BI__builtin_amdgcn_fence_opencl:
+ OrderIndex = 1;
+ ScopeIndex = 2;
+ break;
default:
return false;
}
@@ -5703,7 +5707,8 @@ bool Sema::CheckAMDGCNBuiltinFunctionCall(unsigned BuiltinID,
switch (static_cast<llvm::AtomicOrderingCABI>(Ord)) {
case llvm::AtomicOrderingCABI::relaxed:
case llvm::AtomicOrderingCABI::consume:
- if (BuiltinID == AMDGPU::BI__builtin_amdgcn_fence)
+ if (BuiltinID == AMDGPU::BI__builtin_amdgcn_fence ||
+ BuiltinID == AMDGPU::BI__builtin_amdgcn_fence_opencl)
return Diag(ArgExpr->getBeginLoc(),
diag::warn_atomic_op_has_invalid_memory_order)
<< 0 << ArgExpr->getSourceRange();
diff --git a/clang/test/CodeGenCXX/builtin-amdgcn-fence-opencl.cpp b/clang/test/CodeGenCXX/builtin-amdgcn-fence-opencl.cpp
new file mode 100644
index 00000000000000..7b93fbf335a713
--- /dev/null
+++ b/clang/test/CodeGenCXX/builtin-amdgcn-fence-opencl.cpp
@@ -0,0 +1,108 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --version 3
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 %s -emit-llvm -O0 -o - \
+// RUN: -triple=amdgcn-amd-amdhsa -Qn -mcode-object-version=none | FileCheck %s
+
+#define LOCAL_MASK (1 << 0)
+#define GLOBAL_MASK (1 << 1)
+#define IMAGE_MASK (1 << 2)
+
+//.
+// CHECK: @.str = private unnamed_addr addrspace(4) constant [10 x i8] c"workgroup\00", align 1
+// CHECK: @.str.1 = private unnamed_addr addrspace(4) constant [6 x i8] c"agent\00", align 1
+// CHECK: @.str.2 = private unnamed_addr addrspace(4) constant [1 x i8] zeroinitializer, align 1
+//.
+// CHECK-LABEL: define dso_local void @_Z10test_localv(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: entry:
+// CHECK-NEXT: fence syncscope("workgroup") seq_cst, !mmra [[META1:![0-9]+]]
+// CHECK-NEXT: fence syncscope("agent") acquire, !mmra [[META1]]
+// CHECK-NEXT: fence seq_cst, !mmra [[META1]]
+// CHECK-NEXT: fence syncscope("agent") acq_rel, !mmra [[META1]]
+// CHECK-NEXT: fence syncscope("workgroup") release, !mmra [[META1]]
+// CHECK-NEXT: ret void
+//
+void test_local() {
+
+ __builtin_amdgcn_fence_opencl(LOCAL_MASK, __ATOMIC_SEQ_CST, "workgroup");
+
+ __builtin_amdgcn_fence_opencl(LOCAL_MASK,__ATOMIC_ACQUIRE, "agent");
+
+ __builtin_amdgcn_fence_opencl(LOCAL_MASK,__ATOMIC_SEQ_CST, "");
+
+ __builtin_amdgcn_fence_opencl(LOCAL_MASK, 4, "agent");
+
+ __builtin_amdgcn_fence_opencl(LOCAL_MASK, 3, "workgroup");
+}
+
+// CHECK-LABEL: define dso_local void @_Z11test_globalv(
+// CHECK-SAME: ) #[[ATTR0]] {
+// CHECK-NEXT: entry:
+// CHECK-NEXT: fence syncscope("workgroup") seq_cst, !mmra [[META2:![0-9]+]]
+// CHECK-NEXT: fence syncscope("agent") acquire, !mmra [[META2]]
+// CHECK-NEXT: fence seq_cst, !mmra [[META2]]
+// CHECK-NEXT: fence syncscope("agent") acq_rel, !mmra [[META2]]
+// CHECK-NEXT: fence syncscope("workgroup") release, !mmra [[META2]]
+// CHECK-NEXT: ret void
+//
+void test_global() {
+
+ __builtin_amdgcn_fence_opencl(GLOBAL_MASK, __ATOMIC_SEQ_CST, "workgroup");
+
+ __builtin_amdgcn_fence_opencl(GLOBAL_MASK,__ATOMIC_ACQUIRE, "agent");
+
+ __builtin_amdgcn_fence_opencl(GLOBAL_MASK,__ATOMIC_SEQ_CST, "");
+
+ __builtin_amdgcn_fence_opencl(GLOBAL_MASK, 4, "agent");
+
+ __builtin_amdgcn_fence_opencl(GLOBAL_MASK, 3, "workgroup");
+}
+
+// CHECK-LABEL: define dso_local void @_Z10test_imagev(
+// CHECK-SAME: ) #[[ATTR0]] {
+// CHECK-NEXT: entry:
+// CHECK-NEXT: fence syncscope("workgroup") seq_cst, !mmra [[META3:![0-9]+]]
+// CHECK-NEXT: fence syncscope("agent") acquire, !mmra [[META3]]
+// CHECK-NEXT: fence seq_cst, !mmra [[META2]]
+// CHECK-NEXT: fence syncscope("agent") acq_rel, !mmra [[META3]]
+// CHECK-NEXT: fence syncscope("workgroup") release, !mmra [[META3]]
+// CHECK-NEXT: ret void
+//
+void test_image() {
+
+ __builtin_amdgcn_fence_opencl(IMAGE_MASK, __ATOMIC_SEQ_CST, "workgroup");
+
+ __builtin_amdgcn_fence_opencl(IMAGE_MASK,__ATOMIC_ACQUIRE, "agent");
+
+ __builtin_amdgcn_fence_opencl(GLOBAL_MASK,__ATOMIC_SEQ_CST, "");
+
+ __builtin_amdgcn_fence_opencl(IMAGE_MASK, 4, "agent");
+
+ __builtin_amdgcn_fence_opencl(IMAGE_MASK, 3, "workgroup");
+}
+
+// CHECK-LABEL: define dso_local void @_Z10test_mixedv(
+// CHECK-SAME: ) #[[ATTR0]] {
+// CHECK-NEXT: entry:
+// CHECK-NEXT: fence syncscope("workgroup") seq_cst, !mmra [[META4:![0-9]+]]
+// CHECK-NEXT: fence syncscope("workgroup") seq_cst, !mmra [[META5:![0-9]+]]
+// CHECK-NEXT: fence syncscope("workgroup") seq_cst, !mmra [[META5]]
+// CHECK-NEXT: ret void
+//
+void test_mixed() {
+
+ __builtin_amdgcn_fence_opencl(IMAGE_MASK | GLOBAL_MASK, __ATOMIC_SEQ_CST, "workgroup");
+ __builtin_amdgcn_fence_opencl(IMAGE_MASK | GLOBAL_MASK | LOCAL_MASK, __ATOMIC_SEQ_CST, "workgroup");
+
+ __builtin_amdgcn_fence_opencl(0xFF,__ATOMIC_SEQ_CST, "workgroup");
+}
+//.
+// CHECK: attributes #[[ATTR0]] = { mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META1]] = !{!"opencl-fence-mem", !"local"}
+// CHECK: [[META2]] = !{!"opencl-fence-mem", !"global"}
+// CHECK: [[META3]] = !{!"opencl-fence-mem", !"image"}
+// CHECK: [[META4]] = !{[[META2]], [[META3]]}
+// CHECK: [[META5]] = !{[[META1]], [[META2]], [[META3]]}
+//.
diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 62306fa667b360..0dba136a43a49a 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -21,6 +21,7 @@
#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/IR/DiagnosticInfo.h"
+#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include "llvm/Support/AtomicOrdering.h"
#include "llvm/TargetParser/TargetParser.h"
@@ -2535,12 +2536,29 @@ bool SIMemoryLegalizer::expandAtomicFence(const SIMemOpInfo &MOI,
AtomicPseudoMIs.push_back(MI);
bool Changed = false;
+ // Refine based on MMRAs. They can override the OrderingAddrSpace
+ auto OrderingAddrSpace = MOI.getOrderingAddrSpace();
+
+ // TODO: Use an enum/parse this sooner?
+ // TODO: Do we need to handle these MMRAs on load/stores/atomicrmw as well?
+ if (auto MMRA = MMRAMetadata(MI->getMMRAMetadata())) {
+ SIAtomicAddrSpace NewAddrSpace = SIAtomicAddrSpace::NONE;
+ if (MMRA.hasTag("opencl-fence-mem", "global"))
+ NewAddrSpace |= SIAtomicAddrSpace::GLOBAL;
+ if (MMRA.hasTag("opencl-fence-mem", "local"))
+ NewAddrSpace |= SIAtomicAddrSpace::LDS;
+ if (MMRA.hasTag("opencl-fence-mem", "image"))
+ NewAddrSpace |= SIAtomicAddrSpace::SCRATCH;
+
+ if (NewAddrSpace != SIAtomicAddrSpace::NONE)
+ OrderingAddrSpace = NewAddrSpace;
+ }
+
if (MOI.isAtomic()) {
if (MOI.getOrdering() == AtomicOrdering::Acquire)
- Changed |= CC->insertWait(MI, MOI.getScope(), MOI.getOrderingAddrSpace(),
- SIMemOp::LOAD | SIMemOp::STORE,
- MOI.getIsCrossAddressSpaceOrdering(),
- Position::BEFORE);
+ Changed |= CC->insertWait(
+ MI, MOI.getScope(), OrderingAddrSpace, SIMemOp::LOAD | SIMemOp::STORE,
+ MOI.getIsCrossAddressSpaceOrdering(), Position::BEFORE);
if (MOI.getOrdering() == AtomicOrdering::Release ||
MOI.getOrdering() == AtomicOrdering::AcquireRelease ||
@@ -2552,8 +2570,7 @@ bool SIMemoryLegalizer::expandAtomicFence(const SIMemOpInfo &MOI,
/// generate a fence. Could add support in this file for
/// barrier. SIInsertWaitcnt.cpp could then stop unconditionally
/// adding S_WAITCNT before a S_BARRIER.
- Changed |= CC->insertRelease(MI, MOI.getScope(),
- MOI.getOrderingAddrSpace(),
+ Changed |= CC->insertRelease(MI, MOI.getScope(), OrderingAddrSpace,
MOI.getIsCrossAddressSpaceOrdering(),
Position::BEFORE);
@@ -2565,8 +2582,7 @@ bool SIMemoryLegalizer::expandAtomicFence(const SIMemOpInfo &MOI,
if (MOI.getOrdering() == AtomicOrdering::Acquire ||
MOI.getOrdering() == AtomicOrdering::AcquireRelease ||
MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent)
- Changed |= CC->insertAcquire(MI, MOI.getScope(),
- MOI.getOrderingAddrSpace(),
+ Changed |= CC->insertAcquire(MI, MOI.getScope(), OrderingAddrSpace,
Position::BEFORE);
return Changed;
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-global.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-global.ll
new file mode 100644
index 00000000000000..3e8cba04b306e1
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-global.ll
@@ -0,0 +1,1428 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX6 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX7 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX10-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+cumode -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX10-CU %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX11-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -mattr=+cumode -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX11-CU %s
+
+define amdgpu_kernel void @workgroup_acquire_fence() {
+; GFX6-LABEL: workgroup_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") acquire, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_release_fence() {
+; GFX6-LABEL: workgroup_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") release, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_acq_rel_fence() {
+; GFX6-LABEL: workgroup_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") acq_rel, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_seq_cst_fence() {
+; GFX6-LABEL: workgroup_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") seq_cst, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
+; GFX6-LABEL: workgroup_one_as_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") acquire, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_release_fence() {
+; GFX6-LABEL: workgroup_one_as_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") release, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
+; GFX6-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") acq_rel, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
+; GFX6-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") seq_cst, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_acquire_fence() {
+; GFX6-LABEL: agent_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") acquire, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_release_fence() {
+; GFX6-LABEL: agent_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") release, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_acq_rel_fence() {
+; GFX6-LABEL: agent_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") acq_rel, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_seq_cst_fence() {
+; GFX6-LABEL: agent_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") seq_cst, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_acquire_fence() {
+; GFX6-LABEL: agent_one_as_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") acquire, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_release_fence() {
+; GFX6-LABEL: agent_one_as_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") release, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_acq_rel_fence() {
+; GFX6-LABEL: agent_one_as_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") acq_rel, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_seq_cst_fence() {
+; GFX6-LABEL: agent_one_as_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") seq_cst, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @system_acquire_fence() {
+; GFX6-LABEL: system_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_invl2
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence acquire, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @system_release_fence() {
+; GFX6-LABEL: system_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbl2
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence release, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @system_acq_rel_fence() {
+; GFX6-LABEL: system_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
+; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbl2
+; GFX90A-TGSPLIT-NEXT: buffer_invl2
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence acq_rel, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @system_seq_cst_fence() {
+; GFX6-LABEL: system_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
+; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbl2
+; GFX90A-TGSPLIT-NEXT: buffer_invl2
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence seq_cst, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_acquire_fence() {
+; GFX6-LABEL: system_one_as_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_invl2
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") acquire, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_release_fence() {
+; GFX6-LABEL: system_one_as_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbl2
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") release, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_acq_rel_fence() {
+; GFX6-LABEL: system_one_as_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
+; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbl2
+; GFX90A-TGSPLIT-NEXT: buffer_invl2
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") acq_rel, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_seq_cst_fence() {
+; GFX6-LABEL: system_one_as_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: buffer_wbinvl1
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: buffer_wbinvl1_vol
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: buffer_gl1_inv
+; GFX10-WGP-NEXT: buffer_gl0_inv
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: buffer_gl1_inv
+; GFX10-CU-NEXT: buffer_gl0_inv
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
+; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
+; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: buffer_wbl2
+; GFX90A-TGSPLIT-NEXT: buffer_invl2
+; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: buffer_gl1_inv
+; GFX11-WGP-NEXT: buffer_gl0_inv
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: buffer_gl1_inv
+; GFX11-CU-NEXT: buffer_gl0_inv
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") seq_cst, !mmra !{!"opencl-fence-mem", !"global"}
+ ret void
+}
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-local.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-local.ll
new file mode 100644
index 00000000000000..1a16f003a05c4d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-local.ll
@@ -0,0 +1,1188 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX6 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX7 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX10-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+cumode -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX10-CU %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX11-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -mattr=+cumode -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX11-CU %s
+
+define amdgpu_kernel void @workgroup_acquire_fence() {
+; GFX6-LABEL: workgroup_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") acquire, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_release_fence() {
+; GFX6-LABEL: workgroup_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") release, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_acq_rel_fence() {
+; GFX6-LABEL: workgroup_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") acq_rel, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_seq_cst_fence() {
+; GFX6-LABEL: workgroup_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") seq_cst, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
+; GFX6-LABEL: workgroup_one_as_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") acquire, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_release_fence() {
+; GFX6-LABEL: workgroup_one_as_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") release, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
+; GFX6-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") acq_rel, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
+; GFX6-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") seq_cst, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_acquire_fence() {
+; GFX6-LABEL: agent_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") acquire, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_release_fence() {
+; GFX6-LABEL: agent_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") release, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_acq_rel_fence() {
+; GFX6-LABEL: agent_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") acq_rel, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_seq_cst_fence() {
+; GFX6-LABEL: agent_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") seq_cst, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_acquire_fence() {
+; GFX6-LABEL: agent_one_as_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") acquire, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_release_fence() {
+; GFX6-LABEL: agent_one_as_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") release, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_acq_rel_fence() {
+; GFX6-LABEL: agent_one_as_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") acq_rel, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_seq_cst_fence() {
+; GFX6-LABEL: agent_one_as_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") seq_cst, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @system_acquire_fence() {
+; GFX6-LABEL: system_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence acquire, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @system_release_fence() {
+; GFX6-LABEL: system_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence release, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @system_acq_rel_fence() {
+; GFX6-LABEL: system_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence acq_rel, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @system_seq_cst_fence() {
+; GFX6-LABEL: system_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence seq_cst, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_acquire_fence() {
+; GFX6-LABEL: system_one_as_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") acquire, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_release_fence() {
+; GFX6-LABEL: system_one_as_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") release, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_acq_rel_fence() {
+; GFX6-LABEL: system_one_as_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") acq_rel, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_seq_cst_fence() {
+; GFX6-LABEL: system_one_as_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") seq_cst, !mmra !{!"opencl-fence-mem", !"local"}
+ ret void
+}
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-private.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-private.ll
new file mode 100644
index 00000000000000..ee8cfb0c5472e8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-private.ll
@@ -0,0 +1,1188 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX6 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX7 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX10-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+cumode -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX10-CU %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX11-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -mattr=+cumode -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX11-CU %s
+
+define amdgpu_kernel void @workgroup_acquire_fence() {
+; GFX6-LABEL: workgroup_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") acquire, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_release_fence() {
+; GFX6-LABEL: workgroup_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") release, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_acq_rel_fence() {
+; GFX6-LABEL: workgroup_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") acq_rel, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_seq_cst_fence() {
+; GFX6-LABEL: workgroup_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup") seq_cst, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
+; GFX6-LABEL: workgroup_one_as_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") acquire, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_release_fence() {
+; GFX6-LABEL: workgroup_one_as_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") release, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
+; GFX6-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") acq_rel, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
+; GFX6-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: workgroup_one_as_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("workgroup-one-as") seq_cst, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_acquire_fence() {
+; GFX6-LABEL: agent_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") acquire, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_release_fence() {
+; GFX6-LABEL: agent_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") release, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_acq_rel_fence() {
+; GFX6-LABEL: agent_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") acq_rel, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_seq_cst_fence() {
+; GFX6-LABEL: agent_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent") seq_cst, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_acquire_fence() {
+; GFX6-LABEL: agent_one_as_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") acquire, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_release_fence() {
+; GFX6-LABEL: agent_one_as_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") release, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_acq_rel_fence() {
+; GFX6-LABEL: agent_one_as_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") acq_rel, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @agent_one_as_seq_cst_fence() {
+; GFX6-LABEL: agent_one_as_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: agent_one_as_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: agent_one_as_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: agent_one_as_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: agent_one_as_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: agent_one_as_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: agent_one_as_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("agent-one-as") seq_cst, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @system_acquire_fence() {
+; GFX6-LABEL: system_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence acquire, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @system_release_fence() {
+; GFX6-LABEL: system_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence release, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @system_acq_rel_fence() {
+; GFX6-LABEL: system_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence acq_rel, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @system_seq_cst_fence() {
+; GFX6-LABEL: system_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence seq_cst, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_acquire_fence() {
+; GFX6-LABEL: system_one_as_acquire_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_acquire_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_acquire_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_acquire_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_acquire_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_acquire_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_acquire_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") acquire, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_release_fence() {
+; GFX6-LABEL: system_one_as_release_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_release_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_release_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_release_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_release_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_release_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_release_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_release_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_release_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") release, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_acq_rel_fence() {
+; GFX6-LABEL: system_one_as_acq_rel_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_acq_rel_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_acq_rel_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_acq_rel_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_acq_rel_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_acq_rel_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_acq_rel_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") acq_rel, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
+
+define amdgpu_kernel void @system_one_as_seq_cst_fence() {
+; GFX6-LABEL: system_one_as_seq_cst_fence:
+; GFX6: ; %bb.0: ; %entry
+; GFX6-NEXT: s_endpgm
+;
+; GFX7-LABEL: system_one_as_seq_cst_fence:
+; GFX7: ; %bb.0: ; %entry
+; GFX7-NEXT: s_endpgm
+;
+; GFX10-WGP-LABEL: system_one_as_seq_cst_fence:
+; GFX10-WGP: ; %bb.0: ; %entry
+; GFX10-WGP-NEXT: s_endpgm
+;
+; GFX10-CU-LABEL: system_one_as_seq_cst_fence:
+; GFX10-CU: ; %bb.0: ; %entry
+; GFX10-CU-NEXT: s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: system_one_as_seq_cst_fence:
+; SKIP-CACHE-INV: ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT: s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX90A-TGSPLIT: ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
+;
+; GFX11-WGP-LABEL: system_one_as_seq_cst_fence:
+; GFX11-WGP: ; %bb.0: ; %entry
+; GFX11-WGP-NEXT: s_endpgm
+;
+; GFX11-CU-LABEL: system_one_as_seq_cst_fence:
+; GFX11-CU: ; %bb.0: ; %entry
+; GFX11-CU-NEXT: s_endpgm
+entry:
+ fence syncscope("one-as") seq_cst, !mmra !{!"opencl-fence-mem", !"image"}
+ ret void
+}
More information about the cfe-commits
mailing list