[llvm] AMDGPU: Add description for new atomicrmw metadata (PR #85052)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 18 07:52:38 PDT 2024
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/85052
>From 2cc2dd648782dd43fe21969f887f28751a5591b3 Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Wed, 13 Mar 2024 14:19:33 +0530
Subject: [PATCH 01/13] AMDGPU: Don't use table for metadata docs, and fix
section headers
I couldn't figure out how to nicely embed a table within a table column.
Copy the formatting that LangRef uses for metadata, and introduce a
metadata section with subsections for each item. Also fix using subsection
markers in place of section markers to avoid sphinx errors.
---
llvm/docs/AMDGPUUsage.rst | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index fd9ad7fac19a95..fe37e85c2a40a6 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1312,24 +1312,30 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
List AMDGPU intrinsics.
+.. _amdgpu_metadata:
+
LLVM IR Metadata
-------------------
+================
+
+The AMDGPU backend implements the following target custom LLVM IR
+metadata.
+
+.. _amdgpu_last_use:
-The AMDGPU backend implements the following LLVM IR metadata.
+'``amdgpu.last.use``' Metadata
+------------------------------
+
+Sets TH_LOAD_LU temporal hint on load instructions that support it.
+Takes priority over nontemporal hint (TH_LOAD_NT). This takes no
+arguments.
+
+.. code-block:: llvm
-.. list-table:: AMDGPU LLVM IR Metatdata
- :name: amdgpu-llvm-ir-metadata-table
+ %val = load i32, ptr %in, align 4, !amdgpu.last.use !{}
- * - Metadata Name
- - Description
- - Values
- * - !amdgpu.last.use
- - Sets TH_LOAD_LU temporal hint on load instructions that support it.
- Takes priority over nontemporal hint (TH_LOAD_NT).
- - {}
LLVM IR Attributes
-------------------
+==================
The AMDGPU backend supports the following LLVM IR attributes.
@@ -1451,7 +1457,7 @@ The AMDGPU backend supports the following LLVM IR attributes.
======================================= ==========================================================
Calling Conventions
--------------------
+===================
The AMDGPU backend supports the following calling conventions:
>From 29794bc1bdc50a7d06ce3a62ad95b4800f631650 Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Wed, 13 Mar 2024 13:08:48 +0530
Subject: [PATCH 02/13] AMDGPU: Add description for
amdgpu.no.access.location.types metadata
Add a spec for yet-to-be-implemented metadata to allow the backend to
fully handle atomicrmw lowering. This is the base of an alternative
to #69229, which inverts the direction to be correct by default, and
extends to cover the peer device case.
Could use a better name
---
llvm/docs/AMDGPUUsage.rst | 43 ++++++++++++++++++++++++++++++++++++++
llvm/docs/ReleaseNotes.rst | 2 ++
2 files changed, 45 insertions(+)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index fe37e85c2a40a6..a6556bbd1752b9 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1333,6 +1333,49 @@ arguments.
%val = load i32, ptr %in, align 4, !amdgpu.last.use !{}
+.. _amdgpu_no_access_location_types:
+
+'``amdgpu.no.access.location.types``' Metadata
+----------------------------------------------
+
+Asserts a memory access does not access bytes residing in certain
+allocation kinds. This is intended for use with :ref:`atomicrmw
+<i_atomicrmw>` and other atomic instructions. This is required to emit
+a native hardware instruction for some :ref:`system scope
+<amdgpu-memory-scopes>` atomic operations on some subtargets. An
+:ref:`atomicrmw <i_atomicrmw>` without metadata will be treated
+conservatively as required to preserve the operation behavior in all
+cases.
+
+If the memory operation does access an address in an indicated region,
+any stored values and any returned results are :ref:`poison
+<poisonvalues>`. This has a single integer argument, interpreted as a
+bitfield. A 0 value is equivalent to removing the metadata.
+
+.. list-table::
+
+ * - Bit
+ - Description
+ * - 0
+ - Not in fine-grained host memory.
+ * - 1
+ - Not in a remote connected peer device (address must be device local)
+
+.. code-block:: llvm
+
+ ; Indicates the access does not access fine-grained memory, or
+ ; remote device memory.
+ %old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.access.location.types !0
+
+ ; Indicates the access does not access fine-grained memory.
+ %old1 = atomicrmw sub ptr %ptr1, i32 1 acquire, !amdgpu.no.access.location.types !1
+
+ ; Indicates the access does not access peer device memory.
+ %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.access.location.types !2
+
+ !0 = !{i32 3}
+ !1 = !{i32 1}
+ !2 = !{i32 2}
LLVM IR Attributes
==================
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index b34a5f31c5eb0a..95ebbb74fbbd7f 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -71,6 +71,8 @@ Changes to the AMDGPU Backend
-----------------------------
* Implemented the ``llvm.get.fpenv`` and ``llvm.set.fpenv`` intrinsics.
+* Added ``!amdgpu.no.access.location.types`` metadata to control
+ atomic behavior.
Changes to the ARM Backend
--------------------------
>From 61553035b313eeb37681aa16d93eb008269f5734 Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Mon, 15 Apr 2024 12:36:19 +0200
Subject: [PATCH 03/13] Add comments to metadata examples
---
llvm/docs/AMDGPUUsage.rst | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 997d9d71e0ce82..0375812ec63ca1 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1373,9 +1373,10 @@ bitfield. A 0 value is equivalent to removing the metadata.
; Indicates the access does not access peer device memory.
%old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.access.location.types !2
- !0 = !{i32 3}
- !1 = !{i32 1}
- !2 = !{i32 2}
+ !0 = !{i32 3} ; no_fine_grained_memory_access | no_remote_memory_access
+ !1 = !{i32 1} ; no_fine_grained_memory_access
+ !2 = !{i32 2} ; no_remote_memory_access
+
LLVM IR Attributes
==================
>From 4c5c29faf0f5320e0df4b96c374b3d9f706678ec Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Mon, 15 Apr 2024 16:53:00 +0200
Subject: [PATCH 04/13] Split into separate metadata components
---
llvm/docs/AMDGPUUsage.rst | 66 ++++++++++++++++++++++-----------------
1 file changed, 37 insertions(+), 29 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 0375812ec63ca1..191948a2ce6b66 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1333,50 +1333,58 @@ arguments.
%val = load i32, ptr %in, align 4, !amdgpu.last.use !{}
-.. _amdgpu_no_access_location_types:
+.. _amdgpu_no_fine_grained_host_memory:
-'``amdgpu.no.access.location.types``' Metadata
-----------------------------------------------
+'``amdgpu.no.fine.grained.host.memory``' Metadata
+-------------------------------------------------
-Asserts a memory access does not access bytes residing in certain
-allocation kinds. This is intended for use with :ref:`atomicrmw
-<i_atomicrmw>` and other atomic instructions. This is required to emit
-a native hardware instruction for some :ref:`system scope
-<amdgpu-memory-scopes>` atomic operations on some subtargets. An
+Asserts a memory access does not access bytes allocated in fine
+grained allocated host memory. This is intended for use with
+:ref:`atomicrmw <i_atomicrmw>` and other atomic instructions. This is
+required to emit a native hardware instruction for some :ref:`system
+scope <amdgpu-memory-scopes>` atomic operations on some subtargets. An
:ref:`atomicrmw <i_atomicrmw>` without metadata will be treated
conservatively as required to preserve the operation behavior in all
-cases.
+cases. This will typically be used in conjunction with
+:ref:`!amdgpu.no.remote.memory.access<amdgpu_no_remote_memory_access>`.
-If the memory operation does access an address in an indicated region,
-any stored values and any returned results are :ref:`poison
-<poisonvalues>`. This has a single integer argument, interpreted as a
-bitfield. A 0 value is equivalent to removing the metadata.
+.. code-block:: llvm
+
+ ; Indicates the access does not access fine-grained memory, or
+ ; remote device memory.
+ %old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.fine.grained.host.memory !0, !amdgpu.no.remote.memory.access !0
+
+ ; Indicates the access does not access peer device memory.
+ %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.fine.grained.host.memory !0
-.. list-table::
+ !0 = !{}
+
+.. _amdgpu_no_remote_memory_access:
+
+'``amdgpu.no.remote.memory.access``' Metadata
+---------------------------------------------
+
+Asserts a memory access does not access bytes in remote connected peer
+device memory (the device address must be device local). This is
+intended for use with :ref:`atomicrmw <i_atomicrmw>` and other atomic
+instructions. This is required to emit a native hardware instruction
+for some :ref:`system scope <amdgpu-memory-scopes>` atomic operations
+on some subtargets. An :ref:`atomicrmw <i_atomicrmw>` without metadata
+will be treated conservatively as required to preserve the operation
+behavior in all cases. This will typically be used in conjunction with
+:ref:`!amdgpu.no.fine.grained.host.memory<amdgpu_no_fine_grained_host_memory>`.
- * - Bit
- - Description
- * - 0
- - Not in fine-grained host memory.
- * - 1
- - Not in a remote connected peer device (address must be device local)
.. code-block:: llvm
; Indicates the access does not access fine-grained memory, or
; remote device memory.
- %old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.access.location.types !0
-
- ; Indicates the access does not access fine-grained memory.
- %old1 = atomicrmw sub ptr %ptr1, i32 1 acquire, !amdgpu.no.access.location.types !1
+ %old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.fine.grained.host.memory !0, !amdgpu.no.remote.memory.access !0
; Indicates the access does not access peer device memory.
- %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.access.location.types !2
-
- !0 = !{i32 3} ; no_fine_grained_memory_access | no_remote_memory_access
- !1 = !{i32 1} ; no_fine_grained_memory_access
- !2 = !{i32 2} ; no_remote_memory_access
+ %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.remote.memory.access !0
+ !0 = !{}
LLVM IR Attributes
==================
>From b8f471cfb7d39531a63fb7270949ca4115923b50 Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Tue, 16 Apr 2024 13:34:21 +0200
Subject: [PATCH 05/13] Fix metadata reference links
---
llvm/docs/AMDGPUUsage.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 191948a2ce6b66..4ec75a8197001b 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1346,7 +1346,7 @@ scope <amdgpu-memory-scopes>` atomic operations on some subtargets. An
:ref:`atomicrmw <i_atomicrmw>` without metadata will be treated
conservatively as required to preserve the operation behavior in all
cases. This will typically be used in conjunction with
-:ref:`!amdgpu.no.remote.memory.access<amdgpu_no_remote_memory_access>`.
+:ref:`\!amdgpu.no.remote.memory.access<amdgpu_no_remote_memory_access>`.
.. code-block:: llvm
@@ -1372,7 +1372,7 @@ for some :ref:`system scope <amdgpu-memory-scopes>` atomic operations
on some subtargets. An :ref:`atomicrmw <i_atomicrmw>` without metadata
will be treated conservatively as required to preserve the operation
behavior in all cases. This will typically be used in conjunction with
-:ref:`!amdgpu.no.fine.grained.host.memory<amdgpu_no_fine_grained_host_memory>`.
+:ref:`\!amdgpu.no.fine.grained.host.memory<amdgpu_no_fine_grained_host_memory>`.
.. code-block:: llvm
>From 2ee7ee618438bff8e2484601c5bf94c4b80e9dc4 Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Tue, 16 Apr 2024 13:34:29 +0200
Subject: [PATCH 06/13] Define denormal mode atomic metadata
Alternatively, ignore.fpenv
---
llvm/docs/AMDGPUUsage.rst | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 4ec75a8197001b..038e3cccf004aa 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1386,6 +1386,26 @@ behavior in all cases. This will typically be used in conjunction with
!0 = !{}
+'``amdgpu.ignore.denormal.mode``' Metadata
+------------------------------------------
+
+For use with :ref:`atomicrmw <i_atomicrmw>` floating-point
+operations. Indicates the handling of denormal inputs and results is
+insignificant and may be inconsistent with the expected floating-point
+mode. This is necessary to emit a native atomic instruction on some
+targets for some address spaces. This is typically used in conjunction
+with :ref:`\!amdgpu.no.remote.memory.access<amdgpu_no_remote_memory_access>`
+and :ref:`\!amdgpu.no.fine.grained.host.memory<amdgpu_no_fine_grained_host_memory>`
+
+
+.. code-block:: llvm
+
+ %res0 = atomicrmw fadd ptr addrspace(1) %ptr, float %value seq_cst, align 4, !amdgpu.ignore.denormal.mode !0
+ %res1 = atomicrmw fadd ptr addrspace(1) %ptr, float %value seq_cst, align 4, !amdgpu.ignore.denormal.mode !0, !amdgpu.no.fine.grained.host.memory !0, !amdgpu.no.remote.memory.access !0
+
+ !0 = !{}
+
+
LLVM IR Attributes
==================
>From 9a15f6fbe3f1045c168b3a7a7bc23ad9d7551bcd Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Thu, 18 Apr 2024 16:40:52 +0200
Subject: [PATCH 07/13] Drop host part of no fined grained metadata
---
llvm/docs/AMDGPUUsage.rst | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 038e3cccf004aa..14160b709e7a71 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1333,13 +1333,13 @@ arguments.
%val = load i32, ptr %in, align 4, !amdgpu.last.use !{}
-.. _amdgpu_no_fine_grained_host_memory:
+.. _amdgpu_no_fine_grained_memory:
-'``amdgpu.no.fine.grained.host.memory``' Metadata
+'``amdgpu.no.fine.grained.memory``' Metadata
-------------------------------------------------
Asserts a memory access does not access bytes allocated in fine
-grained allocated host memory. This is intended for use with
+grained allocated memory. This is intended for use with
:ref:`atomicrmw <i_atomicrmw>` and other atomic instructions. This is
required to emit a native hardware instruction for some :ref:`system
scope <amdgpu-memory-scopes>` atomic operations on some subtargets. An
@@ -1352,10 +1352,10 @@ cases. This will typically be used in conjunction with
; Indicates the access does not access fine-grained memory, or
; remote device memory.
- %old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.fine.grained.host.memory !0, !amdgpu.no.remote.memory.access !0
+ %old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.fine.grained.memory !0, !amdgpu.no.remote.memory.access !0
; Indicates the access does not access peer device memory.
- %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.fine.grained.host.memory !0
+ %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.fine.grained.memory !0
!0 = !{}
@@ -1379,7 +1379,7 @@ behavior in all cases. This will typically be used in conjunction with
; Indicates the access does not access fine-grained memory, or
; remote device memory.
- %old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.fine.grained.host.memory !0, !amdgpu.no.remote.memory.access !0
+ %old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.fine.grained.memory !0, !amdgpu.no.remote.memory.access !0
; Indicates the access does not access peer device memory.
%old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.remote.memory.access !0
@@ -1395,13 +1395,13 @@ insignificant and may be inconsistent with the expected floating-point
mode. This is necessary to emit a native atomic instruction on some
targets for some address spaces. This is typically used in conjunction
with :ref:`\!amdgpu.no.remote.memory.access<amdgpu_no_remote_memory_access>`
-and :ref:`\!amdgpu.no.fine.grained.host.memory<amdgpu_no_fine_grained_host_memory>`
+and :ref:`\!amdgpu.no.fine.grained.memory<amdgpu_no_fine_grained_memory>`
.. code-block:: llvm
%res0 = atomicrmw fadd ptr addrspace(1) %ptr, float %value seq_cst, align 4, !amdgpu.ignore.denormal.mode !0
- %res1 = atomicrmw fadd ptr addrspace(1) %ptr, float %value seq_cst, align 4, !amdgpu.ignore.denormal.mode !0, !amdgpu.no.fine.grained.host.memory !0, !amdgpu.no.remote.memory.access !0
+ %res1 = atomicrmw fadd ptr addrspace(1) %ptr, float %value seq_cst, align 4, !amdgpu.ignore.denormal.mode !0, !amdgpu.no.fine.grained.memory !0, !amdgpu.no.remote.memory.access !0
!0 = !{}
>From ec02ead2d7206f328a9e496628a6e2944aba70dc Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Thu, 18 Apr 2024 16:43:15 +0200
Subject: [PATCH 08/13] Add note that amdgpu.no.remote.memory.access is usually
sufficient to emit an instruction
---
llvm/docs/AMDGPUUsage.rst | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 14160b709e7a71..63751414e5f7e5 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1365,13 +1365,16 @@ cases. This will typically be used in conjunction with
---------------------------------------------
Asserts a memory access does not access bytes in remote connected peer
-device memory (the device address must be device local). This is
-intended for use with :ref:`atomicrmw <i_atomicrmw>` and other atomic
+device memory (the address must be device local). This is intended for
+use with :ref:`atomicrmw <i_atomicrmw>` and other atomic
instructions. This is required to emit a native hardware instruction
for some :ref:`system scope <amdgpu-memory-scopes>` atomic operations
-on some subtargets. An :ref:`atomicrmw <i_atomicrmw>` without metadata
-will be treated conservatively as required to preserve the operation
-behavior in all cases. This will typically be used in conjunction with
+on some subtargets. For most integer atomic operations, this is a
+sufficient restriction to emit a native atomic instruction.
+
+An :ref:`atomicrmw <i_atomicrmw>` without metadata will be treated
+conservatively as required to preserve the operation behavior in all
+cases. This will typically be used in conjunction with
:ref:`\!amdgpu.no.fine.grained.host.memory<amdgpu_no_fine_grained_host_memory>`.
>From 79db05f0e4338a6e40581e01c4a74261d5194ed0 Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Thu, 18 Apr 2024 16:43:28 +0200
Subject: [PATCH 09/13] Rename
---
llvm/docs/AMDGPUUsage.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 63751414e5f7e5..7496e04cbd3b46 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1375,7 +1375,7 @@ sufficient restriction to emit a native atomic instruction.
An :ref:`atomicrmw <i_atomicrmw>` without metadata will be treated
conservatively as required to preserve the operation behavior in all
cases. This will typically be used in conjunction with
-:ref:`\!amdgpu.no.fine.grained.host.memory<amdgpu_no_fine_grained_host_memory>`.
+:ref:`\!amdgpu.no.fine.grained.memory<amdgpu_no_fine_grained_memory>`.
.. code-block:: llvm
>From 0678218b04f9ddc536a2ad7c0b73d908882e719f Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Thu, 18 Apr 2024 16:45:42 +0200
Subject: [PATCH 10/13] Reorder documentation sections
---
llvm/docs/AMDGPUUsage.rst | 52 +++++++++++++++++++--------------------
1 file changed, 26 insertions(+), 26 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7496e04cbd3b46..33bac6daf8f76d 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1333,20 +1333,22 @@ arguments.
%val = load i32, ptr %in, align 4, !amdgpu.last.use !{}
-.. _amdgpu_no_fine_grained_memory:
+'``amdgpu.no.remote.memory.access``' Metadata
+---------------------------------------------
-'``amdgpu.no.fine.grained.memory``' Metadata
--------------------------------------------------
+Asserts a memory access does not access bytes in remote connected peer
+device memory (the address must be device local). This is intended for
+use with :ref:`atomicrmw <i_atomicrmw>` and other atomic
+instructions. This is required to emit a native hardware instruction
+for some :ref:`system scope <amdgpu-memory-scopes>` atomic operations
+on some subtargets. For most integer atomic operations, this is a
+sufficient restriction to emit a native atomic instruction.
-Asserts a memory access does not access bytes allocated in fine
-grained allocated memory. This is intended for use with
-:ref:`atomicrmw <i_atomicrmw>` and other atomic instructions. This is
-required to emit a native hardware instruction for some :ref:`system
-scope <amdgpu-memory-scopes>` atomic operations on some subtargets. An
-:ref:`atomicrmw <i_atomicrmw>` without metadata will be treated
+An :ref:`atomicrmw <i_atomicrmw>` without metadata will be treated
conservatively as required to preserve the operation behavior in all
cases. This will typically be used in conjunction with
-:ref:`\!amdgpu.no.remote.memory.access<amdgpu_no_remote_memory_access>`.
+:ref:`\!amdgpu.no.fine.grained.memory<amdgpu_no_fine_grained_memory>`.
+
.. code-block:: llvm
@@ -1355,28 +1357,24 @@ cases. This will typically be used in conjunction with
%old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.fine.grained.memory !0, !amdgpu.no.remote.memory.access !0
; Indicates the access does not access peer device memory.
- %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.fine.grained.memory !0
+ %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.remote.memory.access !0
!0 = !{}
-.. _amdgpu_no_remote_memory_access:
-
-'``amdgpu.no.remote.memory.access``' Metadata
----------------------------------------------
+.. _amdgpu_no_fine_grained_memory:
-Asserts a memory access does not access bytes in remote connected peer
-device memory (the address must be device local). This is intended for
-use with :ref:`atomicrmw <i_atomicrmw>` and other atomic
-instructions. This is required to emit a native hardware instruction
-for some :ref:`system scope <amdgpu-memory-scopes>` atomic operations
-on some subtargets. For most integer atomic operations, this is a
-sufficient restriction to emit a native atomic instruction.
+'``amdgpu.no.fine.grained.memory``' Metadata
+-------------------------------------------------
-An :ref:`atomicrmw <i_atomicrmw>` without metadata will be treated
+Asserts a memory access does not access bytes allocated in fine
+grained allocated memory. This is intended for use with
+:ref:`atomicrmw <i_atomicrmw>` and other atomic instructions. This is
+required to emit a native hardware instruction for some :ref:`system
+scope <amdgpu-memory-scopes>` atomic operations on some subtargets. An
+:ref:`atomicrmw <i_atomicrmw>` without metadata will be treated
conservatively as required to preserve the operation behavior in all
cases. This will typically be used in conjunction with
-:ref:`\!amdgpu.no.fine.grained.memory<amdgpu_no_fine_grained_memory>`.
-
+:ref:`\!amdgpu.no.remote.memory.access<amdgpu_no_remote_memory_access>`.
.. code-block:: llvm
@@ -1385,10 +1383,12 @@ cases. This will typically be used in conjunction with
%old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.fine.grained.memory !0, !amdgpu.no.remote.memory.access !0
; Indicates the access does not access peer device memory.
- %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.remote.memory.access !0
+ %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.fine.grained.memory !0
!0 = !{}
+.. _amdgpu_no_remote_memory_access:
+
'``amdgpu.ignore.denormal.mode``' Metadata
------------------------------------------
>From e6cb77fce3bfcb63112b0f57d65978ca3d70c758 Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Thu, 18 Apr 2024 16:46:23 +0200
Subject: [PATCH 11/13] Clarify no remote implies host or peer device
---
llvm/docs/AMDGPUUsage.rst | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 33bac6daf8f76d..ca1b9c6a15d498 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1336,13 +1336,14 @@ arguments.
'``amdgpu.no.remote.memory.access``' Metadata
---------------------------------------------
-Asserts a memory access does not access bytes in remote connected peer
-device memory (the address must be device local). This is intended for
-use with :ref:`atomicrmw <i_atomicrmw>` and other atomic
-instructions. This is required to emit a native hardware instruction
-for some :ref:`system scope <amdgpu-memory-scopes>` atomic operations
-on some subtargets. For most integer atomic operations, this is a
-sufficient restriction to emit a native atomic instruction.
+Asserts a memory access does not access bytes in host memory, or
+remote connected peer device memory (the address must be device
+local). This is intended for use with :ref:`atomicrmw <i_atomicrmw>`
+and other atomic instructions. This is required to emit a native
+hardware instruction for some :ref:`system scope
+<amdgpu-memory-scopes>` atomic operations on some subtargets. For most
+integer atomic operations, this is a sufficient restriction to emit a
+native atomic instruction.
An :ref:`atomicrmw <i_atomicrmw>` without metadata will be treated
conservatively as required to preserve the operation behavior in all
>From e3598801762b104a4b94b90624c7daf0b886fe86 Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Thu, 18 Apr 2024 16:49:41 +0200
Subject: [PATCH 12/13] Consistently spell fine-grained
---
llvm/docs/AMDGPUUsage.rst | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index ca1b9c6a15d498..25a006341623f9 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1367,8 +1367,8 @@ cases. This will typically be used in conjunction with
'``amdgpu.no.fine.grained.memory``' Metadata
-------------------------------------------------
-Asserts a memory access does not access bytes allocated in fine
-grained allocated memory. This is intended for use with
+Asserts a memory access does not access bytes allocated in
+fine-grained allocated memory. This is intended for use with
:ref:`atomicrmw <i_atomicrmw>` and other atomic instructions. This is
required to emit a native hardware instruction for some :ref:`system
scope <amdgpu-memory-scopes>` atomic operations on some subtargets. An
@@ -1383,7 +1383,7 @@ cases. This will typically be used in conjunction with
; remote device memory.
%old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.fine.grained.memory !0, !amdgpu.no.remote.memory.access !0
- ; Indicates the access does not access peer device memory.
+ ; Indicates the access does not access fine-grained memory
%old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.fine.grained.memory !0
!0 = !{}
>From f46a8aabbb56b349a417d2f819ff5e388a680224 Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Thu, 18 Apr 2024 16:52:18 +0200
Subject: [PATCH 13/13] Another note about amdgpu.ignore.denormal.mode
---
llvm/docs/AMDGPUUsage.rst | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 25a006341623f9..0b663108b1b797 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1397,9 +1397,11 @@ For use with :ref:`atomicrmw <i_atomicrmw>` floating-point
operations. Indicates the handling of denormal inputs and results is
insignificant and may be inconsistent with the expected floating-point
mode. This is necessary to emit a native atomic instruction on some
-targets for some address spaces. This is typically used in conjunction
-with :ref:`\!amdgpu.no.remote.memory.access<amdgpu_no_remote_memory_access>`
-and :ref:`\!amdgpu.no.fine.grained.memory<amdgpu_no_fine_grained_memory>`
+targets for some address spaces where float denormals are
+unconditionally flushed. This is typically used in conjunction with
+:ref:`\!amdgpu.no.remote.memory.access<amdgpu_no_remote_memory_access>`
+and
+:ref:`\!amdgpu.no.fine.grained.memory<amdgpu_no_fine_grained_memory>`
.. code-block:: llvm
More information about the llvm-commits
mailing list