[llvm] [RFC] Memory Model Relaxation Annotations (PR #78569)

Thu Jan 18 04:42:48 PST 2024

llvmbot wrote:




@llvm/pr-subscribers-llvm-globalisel

Author: Pierre van Houtryve (Pierre-vh)

<details>
<summary>Changes</summary>

Work-in-progress patch to implement the core/target-agnostic components of Memory Model Relaxation Annotations.

This diff is mostly complete but likely has a few holes, especially in codegen (MMRAs aren't always carried over to MI layer I believe). Most of the work needs to be done in adding tests and analyzing the outputs to find what's missing.

---

Patch is 98.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78569.diff


35 Files Affected:

- (added) llvm/docs/MemoryModelRelaxationAnnotations.rst (+351) 
- (modified) llvm/docs/Reference.rst (+4) 
- (modified) llvm/include/llvm/Analysis/VectorUtils.h (+1-1) 
- (modified) llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h (+9) 
- (modified) llvm/include/llvm/CodeGen/MachineFunction.h (+2-1) 
- (modified) llvm/include/llvm/CodeGen/MachineInstr.h (+37-10) 
- (modified) llvm/include/llvm/CodeGen/MachineInstrBuilder.h (+32-14) 
- (modified) llvm/include/llvm/CodeGen/SelectionDAG.h (+12) 
- (modified) llvm/include/llvm/IR/FixedMetadataKinds.def (+1) 
- (added) llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h (+101) 
- (modified) llvm/lib/Analysis/VectorUtils.cpp (+10-1) 
- (modified) llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp (+1) 
- (modified) llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp (+3-1) 
- (modified) llvm/lib/CodeGen/MIRPrinter.cpp (+7) 
- (modified) llvm/lib/CodeGen/MachineFunction.cpp (+2-2) 
- (modified) llvm/lib/CodeGen/MachineInstr.cpp (+37-12) 
- (modified) llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp (+9) 
- (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+1-1) 
- (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+11-6) 
- (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp (+7) 
- (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp (+2) 
- (modified) llvm/lib/IR/CMakeLists.txt (+1) 
- (modified) llvm/lib/IR/Instruction.cpp (+13-1) 
- (added) llvm/lib/IR/MemoryModelRelaxationAnnotations.cpp (+201) 
- (modified) llvm/lib/IR/Verifier.cpp (+27) 
- (modified) llvm/lib/Transforms/Scalar/GVNHoist.cpp (+2-1) 
- (modified) llvm/lib/Transforms/Utils/FunctionComparator.cpp (+5) 
- (modified) llvm/lib/Transforms/Utils/Local.cpp (+14-1) 
- (added) llvm/test/CodeGen/AMDGPU/GlobalISel/mmra.ll (+49) 
- (added) llvm/test/CodeGen/AMDGPU/mmra.ll (+254) 
- (added) llvm/test/Verifier/mmra-allowed.ll (+33) 
- (added) llvm/test/Verifier/mmra.ll (+37) 
- (modified) llvm/unittests/CodeGen/MachineInstrTest.cpp (+51) 
- (modified) llvm/unittests/IR/CMakeLists.txt (+1) 
- (added) llvm/unittests/IR/MemoryModelRelaxationAnnotationsTest.cpp (+34) 


``````````diff

diff --git a/llvm/docs/MemoryModelRelaxationAnnotations.rst b/llvm/docs/MemoryModelRelaxationAnnotations.rst
new file mode 100644
index 000000000000000..3ec0ddaf289a881
--- /dev/null
+++ b/llvm/docs/MemoryModelRelaxationAnnotations.rst
@@ -0,0 +1,351 @@
+
+===================================
+Memory Model Relaxation Annotations
+===================================
+.. contents::
+   :local:
+Introduction
+============
+Memory Model Relaxation Annotations (MMRAs) are target-defined properties
+on instructions that can be used to selectively relax constraints placed
+by the memory model. For example:
+* The use of ``VulkanMemoryModel`` in a SPIRV program allows certain
+  memory operations to be reordered across ``acquire`` or ``release``
+  operations.
+* OpenCL APIs expose primitives to only fence a specific set of address
+  spaces, carrying that information to the backend can enable the
+  use of faster synchronization instructions, rather than fencing all
+  address spaces.
+MMRAs offer an opt-in system for targets to relax the default LLVM
+memory model.
+As such, they are attached to an operation using LLVM metadata which
+can always be dropped without affecting correctness.
+Definitions
+===========
+memory operation
+    A load, a store, an atomic, or a function call that is marked as
+    accessing memory.
+synchronizing operation
+    An instruction that synchronizes memory with other threads (e.g.
+    an atomic or a fence).
+tag
+    Metadata attached to a memory or synchronizing operation
+    that represents some target-defined property regarding memory
+    synchronization.
+    An operation may have multiple tags that each represent a different
+    property.
+    A tag is composed of a pair of metadata nodes:
+    * a *prefix* string.
+    * a *suffix* integer or string.
+    In LLVM IR, the pair is represented using a metadata tuple.
+    In other cases (comments, documentation, etc.), we may use the
+    ``prefix:suffix`` notation.
+    For example:
+    .. code-block::
+      :caption: Example: Tags in Metadata
+      !0 = !{!"scope", !"workgroup"}  # scope:workgroup
+      !1 = !{!"scope", !"device"}     # scope:device
+      !2 = !{!"scope", !"system"}     # scope:system
+      !3 = !{!"sync-as", i32 2}  # sync-as:2
+      !4 = !{!"sync-as", i32 1}  # sync-as:1
+      !5 = !{!"sync-as", i32 0}  # sync-as:0
+    .. note::
+      The only semantics relevant to the optimizer is the
+      "compatibility" relation defined below. All other
+      semantics are target defined.
+    Tags can also be organised in lists to allow operations
+    to specify all of the tags they belong to. Such a list
+    is referred to as a "set of tags".
+    .. code-block::
+      :caption: Example: Set of Tags in Metadata
+      !0 = !{!"scope", !"workgroup"}
+      !1 = !{!"sync-as", !"private"}
+      !2 = !{!0, !2}
+    .. note::
+      If an operation does not have MMRA metadata, it's treated as if
+      it has an empty list (``!{}``) of tags.
+    Note that it is not an error if a tag is not recognized by the
+    instruction it is applied to, or by the current target.
+    Such tags are simply ignored.
+    Both synchronizing operations and memory operations can have
+    zero or more tags attached to them using the ``!mmra`` syntax.
+    For the sake of readability in examples below,
+    we use a (non-functional) short syntax to represent MMMRA metadata:
+    .. code-block::
+      :caption: Short Syntax Example
+      store %ptr1 # foo:bar
+      store %ptr1 !mmra !{!"foo", !"bar"}
+    These two notations can be used in this document and are strictly
+    equivalent. However, only the second version is functional.
+compatibility
+    Two sets of tags are said to be *compatible* iff, for every unique
+    tag prefix P present in at least one set:
+    - the other set contains no tag with prefix P, or
+    - at least one tag with prefix P is common to both sets.
+    The above definition implies that an empty set is always compatible
+    with any other set. This is an important property as it ensures that
+    if a transform drops the metadata on an operation, it can never affect
+    correctness. In other words, the memory model cannot be relaxed further
+    by deleting metadata from instructions.
+.. _HappensBefore:
+The *happens-before* Relation
+==============================
+Compatibility checks can be used to opt out of the *happens-before* relation
+established between two instructions.
+Ordering
+    When two instructions' metadata are not compatible, any program order
+    between them are not in *happens-before*.
+    For example, consider two tags ``foo:bar`` and
+    ``foo:baz`` exposed by a target:
+    .. code-block::
+       A: store %ptr1                 # foo:bar
+       B: store %ptr2                 # foo:baz
+       X: store atomic release %ptr3  # foo:bar
+    In the above figure, ``A`` is compatible with ``X``, and hence ``A``
+    happens-before ``X``. But ``B`` is not compatible with
+    ``X``, and hence it is not happens-before ``X``.
+Synchronization
+    If an synchronizing operation has one or more tags, then whether it
+    participate in the  ``seq_cst`` order with other operations is target
+    dependent.
+    .. code-block::
+       ; Depending on the semantics of foo:bar & foo:bux, this may not
+       ; synchronize with another sequence.
+       fence release               # foo:bar
+       store atomic %ptr1          # foo:bux
+Examples
+--------
+.. code-block:: text
+  :caption: Example 1
+   A: store ptr addrspace(1) %ptr2                  # sync-as:1 vulkan:nonprivate
+   B: store atomic release ptr addrspace(1) %ptr3   # sync-as:0 vulkan:nonprivate
+A and B are not ordered relative to each other
+(no *happens-before*) because their sets of tags are not compatible.
+Note that the ``sync-as`` value does not have to match the ``addrspace`` value.
+e.g. In Example 1, a store-release to a location in ``addrspace(1)`` wants to
+only synchronize with operations happening in ``addrspace(0)``.
+.. code-block:: text
+  :caption: Example 2
+   A: store ptr addrspace(1) %ptr2                 # sync-as:1 vulkan:nonprivate
+   B: store atomic release ptr addrspace(1) %ptr3  # sync-as:1 vulkan:nonprivate
+The ordering of A and B is unaffected because their set of tags are
+compatible.
+Note that A and B may or may not be in *happens-before* due to other reasons.
+.. code-block:: text
+  :caption: Example 3
+   A: store ptr addrspace(1) %ptr2                 # sync-as:1 vulkan:nonprivate
+   B: store atomic release ptr addrspace(1) %ptr3  # vulkan:nonprivate
+The ordering of A and B is unaffected because their set of tags are
+compatible.
+.. code-block:: text
+  :caption: Example 3
+   A: store ptr addrspace(1) %ptr2                 # sync-as:1
+   B: store atomic release ptr addrspace(1) %ptr3  # sync-as:2
+A and B do not have to be ordered relative to each other
+(no *happens-before*) because their sets of tags are not compatible.
+Use-cases
+=========
+SPIRV ``NonPrivatePointer``
+---------------------------
+MMRAs can support the SPIRV capability
+``VulkanMemoryModel``, where synchronizing operations only affect
+memory operations that specify ``NonPrivatePointer`` semantics.
+The example below is generated from a SPIRV program using the
+following recipe:
+- Add ``vulkan:nonprivate`` to every synchronizing operation.
+- Add ``vulkan:nonprivate`` to every non-atomic memory operation
+  that is marked ``NonPrivatePointer``.
+- Add ``vulkan:private`` to tags of every non-atomic memory operation
+  that is not marked ``NonPrivatePointer``.
+.. code-block::
+   Thread T1:
+    A: store %ptr1                 # vulkan:nonprivate
+    B: store %ptr2                 # vulkan:private
+    X: store atomic release %ptr3  # vulkan:nonprivate
+   Thread T2:
+    Y: load atomic acquire %ptr3   # vulkan:nonprivate
+    C: load %ptr2                  # vulkan:private
+    D: load %ptr1                  # vulkan:nonprivate
+Compatibility ensures that operation ``A`` is ordered
+relative to ``X`` while operation ``D`` is ordered relative to ``Y``.
+If ``X`` synchronizes with ``Y``, then ``A`` happens-before ``D``.
+No such relation can be inferred about operations ``B`` and ``C``.
+.. note::
+   The `Vulkan Memory Model <https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-model-non-private>`_
+   considers all atomic operation non-private.
+   Whether ``vulkan:nonprivate`` would be specified on atomic operations is
+   an implementation detail, as an atomic operation is always ``nonprivate``.
+   The implementation may choose to be explicit and emit IR with
+   ``vulkan:nonprivate`` on every atomic operation, or it could choose to
+   only emit ``vulkan::private`` and assume ``vulkan:nonprivate``
+   by default.
+Operations marked with ``vulkan:private`` effectively opt out of the
+happens-before order in a SPIRV program since they are incompatible
+with every synchronizing operation. Note that SPIRV operations that
+are not marked ``NonPrivatePointer`` are not entirely private to the
+thread --- they are implicitly synchronized at the start or end of a
+thread by the Vulkan *system-synchronizes-with* relationship. This
+example assumes that the target-defined semantics of
+``vulkan:private`` correctly implements this property.
+This scheme is general enough to express the interoperability of SPIRV
+programs with other environments.
+.. code-block::
+   Thread T1:
+   A: store %ptr1                 # vulkan:nonprivate
+   X: store atomic release %ptr2  # vulkan:nonprivate
+   Thread T2:
+   Y: load atomic acquire %ptr2   # foo:bar
+   B: load %ptr1
+In the above example, thread ``T1`` originates from a SPIRV program
+while thread ``T2`` originates from a non-SPIRV program. Whether ``X``
+can synchronize with ``Y`` is target defined.  If ``X`` synchronizes
+with ``Y``, then ``A`` happens before ``B`` (because A/X and
+Y/B are compatible).
+Implementation Example
+~~~~~~~~~~~~~~~~~~~~~~
+Consider the implementation of SPIRV ``NonPrivatePointer`` on a target
+where all memory operations are cached, and the entire cache is
+flushed or invalidated at a ``release`` or ``acquire`` respectively. A
+possible scheme is that when translating a SPIRV program, memory
+operations marked ``NonPrivatePointer`` should not be cached, and the
+cache contents should not be touched during an ``acquire`` and
+``release`` operation.
+This could be implemented using the tags that share the ``vulkan:`` prefix,
+as follows:
+- For memory operations:
+  - Operations with ``vulkan:nonprivate`` should bypass the cache.
+  - Operations with ``vulkan:private`` should be cached.
+  - Operations that specify neither or both should conservatively
+    bypass the cache to ensure correctness.
+- For synchronizing operations:
+  - Operations with ``vulkan:nonprivate`` should not flush or
+    invalidate the cache.
+  - Operations with ``vulkan:private`` should flush or invalidate the cache.
+  - Operations that specify neither or both should conservatively
+    flush or invalidate the cache to ensure correctness.
+.. note::
+   In such an implementation, dropping the metadata on an operation, while
+   not affecting correctness, may have big performance implications.
+   e.g. an operation bypasses the cache when it shouldn't.
+Memory Types
+------------
+MMRAs may express the selective synchronization of
+different memory types.
+As an example, a target may expose an ``sync-as:<N>`` tag to
+pass information about which address spaces are synchronized by the
+execution of a synchronizing operation.
+.. note::
+  Address spaces are used here as a common example, but this concept isn't
+  can apply for other "memory types". What "memory types" means here is
+  up to the target.
+.. code-block::
+   # let 1 = global address space
+   # let 3 = local address space
+   Thread T1:
+   A: store %ptr1                                  # sync-as:1
+   B: store %ptr2                                  # sync-as:3
+   X: store atomic release ptr addrspace(0) %ptr3  # sync-as:3
+   Thread T2:
+   Y: load atomic acquire ptr addrspace(0) %ptr3   # sync-as:3
+   C: load %ptr2                                   # sync-as:3
+   D: load %ptr1                                   # sync-as:1
+In the above figure, ``X`` and ``Y`` are atomic operations on a
+location in the ``global``  address space. If ``X`` synchronizes with
+``Y``, then ``B`` happens-before ``C`` in the ``local`` address
+space. But no such statement can be made about operations ``A`` and
+``D``, although they are peformed on a location in the ``global``
+address space.
+Implementation Example: Adding Address Space Information to Fences
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Languages such as OpenCL C provide fence operations such as
+``atomic_work_item_fence`` that can take an explicit address
+space to fence.
+By default, LLVM has no means to carry that information in the IR, so
+the information is lost during lowering to LLVM IR. This means that
+targets such as AMDGPU have to conservatively emit instructions to
+fence all address spaces in all cases, which can have a noticeable
+performance impact in high-performance applications.
+MMRAs may be used to preserve that information at the IR level, all the
+way through code generation. For example, a fence that only affects the
+global address space ``addrspace(1)`` may be lowered as
+.. code-block::
+    fence release # sync-as:1
+and the target may use the presence of ``sync-as:1`` to infer that it
+must only emit instruction to fence the global address space.
+Note that as MMRAs are opt in, a fence that does not have MMRA metadata
+could still be lowered conservatively, so this optimization would only
+apply if the front-end emits the MMRA metadata on the fence instructions.
+Additional Topics
+=================
+.. note::
+  The following sections are informational.
+Performance Impact
+------------------
+MMRAs are a way to capture optimization opportunities in the program.
+But when an operation mentions no tags or conflicting tags,
+the target may need to produce conservative code to ensure correctness
+at the cost of performance. This can happen in the following situations:
+1. When a target first introduces MMRAs, the
+   frontend might not have been updated to emit them.
+2. An optimization may drop MMRA metadata.
+3. An optimization may add arbitrary tags to an operation.
+Note that targets can always choose to ignore (or even drop) MMRAs
+and revert to the default behavior/codegen heuristics without
+affecting correctness.
+Consequences of the Absence of *happens-before*
+-----------------------------------------------
+In the :ref:`happens-before<HappensBefore>` section, we defined how an
+*happens-before* relation between two instruction can be broken
+by leveraging compatibility between MMRAs. When the instructions
+are incompatible and there is no *happens-before* relation, we say
+that the instructions "do not have to be ordered relative to each
+other".
+"Ordering" in this context is a very broad term which covers both
+static and runtime aspects.
+When there is no ordering constraint, we *could* statically reorder
+the instructions in an optimizer transform if the reordering does
+not break other constraints as single location coherence.
+Static reordering is one consequence of breaking *happens-before*,
+but is not the most interesting one.
+Run-time consequences are more interesting. When there is an
+*happens-before* relation between instructions, the target has to emit
+synchronization code to ensure other threads will observe the effects of
+the instructions in the right order.
+For instance, the target may have to wait for previous loads & stores to
+finish before starting a fence-release, or there may be a need to flush a
+memory cache before executing the next instruction.
+In the absence of *happens-before*, there is no such requirement and
+no waiting or flushing is required. This may noticeably speed up
+execution in some cases.
+Combining Operations
+--------------------
+If a pass can combine multiple memory or synchronizing operations
+into one, then the metadata of the new instruction(s) shall be a
+prefix-wise union of the metadata of the source instructions.
+Let A and B be two tags set, and U be the prefix-wise union of A and B.
+For every unique tag prefix P present in A or B:
+* If either A or B has no tags with prefix P, no tags with prefix
+  P are added to U.
+* If both A and B have at least one tag with prefix P, only the tags
+  common to A and B are added to U.
+Examples:
+.. code-block::
+    A: store release %ptr1  # foo:x, foo:y, bar:x
+    B: store release %ptr2  # foo:x, bar:y
+    # Unique prefixes P = [foo, bar]
+    # "foo:x" is common to A and B so it's added to U.
+    # "bar:x" != "bar:y" so it's not added to U.
+    U: store release %ptr3  # foo:x
+.. code-block::
+    A: store release %ptr1  # foo:x, foo:y
+    B: store release %ptr2  # foo:x, bux:y
+    # Unique prefixes P = [foo, bux]
+    # "foo:x" is common to A and B so it's added to U.
+    # No tags have the prefix "bux" in A.
+    U: store release %ptr3  # foo:x
+.. code-block::
+    A: store release %ptr1
+    B: store release %ptr2  # foo:x, bar:y
+    # Unique prefixes P = [foo, bar]
+    # No tags with "foo" or "bar" in A, so no tags added.
+    U: store release %ptr3
diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
index 3a1d1665be439e2..1661c8c533db1d2 100644
--- a/llvm/docs/Reference.rst
+++ b/llvm/docs/Reference.rst
@@ -39,6 +39,7 @@ LLVM and API reference documentation.
    PDB/index
    PointerAuth
    ScudoHardenedAllocator
+   MemoryModelRelaxationAnnotations
    MemTagSanitizer
    Security
    SecurityTransparencyReports
@@ -194,6 +195,9 @@ Additional Topics
 :doc:`ScudoHardenedAllocator`
   A library that implements a security-hardened `malloc()`.
 
+:doc:`MemoryModelRelaxationAnnotations`
+  Target-defined relaxation to LLVM's concurrency model.
+
 :doc:`MemTagSanitizer`
   Security hardening for production code aiming to mitigate memory
   related vulnerabilities. Based on the Armv8.5-A Memory Tagging Extension.
diff --git a/llvm/include/llvm/Analysis/VectorUtils.h b/llvm/include/llvm/Analysis/VectorUtils.h
index 7a92e62b53c53db..d1a16fcd7c5ebb6 100644
--- a/llvm/include/llvm/Analysis/VectorUtils.h
+++ b/llvm/include/llvm/Analysis/VectorUtils.h
@@ -301,7 +301,7 @@ MDNode *intersectAccessGroups(const Instruction *Inst1,
                               const Instruction *Inst2);
 
 /// Specifically, let Kinds = [MD_tbaa, MD_alias_scope, MD_noalias, MD_fpmath,
-/// MD_nontemporal, MD_access_group].
+/// MD_nontemporal, MD_access_group, MD_MMRA].
 /// For K in Kinds, we get the MDNode for K from each of the
 /// elements of VL, compute their "intersection" (i.e., the most generic
 /// metadata value that covers all of the individual values), and set I's
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
index 1387a0a37561c4b..2dc23e864376561 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
@@ -52,6 +52,8 @@ struct MachineIRBuilderState {
   DebugLoc DL;
   /// PC sections metadata to be set to any instruction we create.
   MDNode *PCSections = nullptr;
+  /// MMRA Metadata to be set on any instruction we create.
+  MDNode *MMRA = nullptr;
 
   /// \name Fields describing the insertion point.
   /// @{
@@ -353,6 +355,7 @@ class MachineIRBuilder {
     setMBB(*MI.getParent());
     State.II = MI.getIterator();
     setPCSections(MI.getPCSections());
+    setMMRAMetadata(MI.getMMRAMetadata());
   }
   /// @}
 
@@ -383,9 +386,15 @@ class MachineIRBuilder {
   /// Set the PC sections metadata to \p MD for all the next build instructions.
   void setPCSections(MDNode *MD) { State.PCSections = MD; }
 
+  /// Set the PC sections metadata to \p MD for all the next build instructions.
+  void setMMRAMetadata(MDNode *MMRA) { State.MMRA = MMRA; }
+
   /// Get the current instruction's PC sections metadata.
   MDNode *getPCSections() { return State.PCSections; }
 
+  /// Get the current instruction's MMRA metadata.
+  MDNode *getMMRAMetadata() { return State.MMRA; }
+
   /// Build and insert <empty> = \p Opcode <empty>.
   /// The insertion point is t...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/78569