[llvm] [AMDGPU] Document amdgpu-as in AMDGPUUsage (PR #94335)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 4 03:54:16 PDT 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-amdgpu
Author: Pierre van Houtryve (Pierre-vh)
<details>
<summary>Changes</summary>
Add a section about fence & address spaces that covers amdgpu-as.
---
Patch is 46.14 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94335.diff
1 Files Affected:
- (modified) llvm/docs/AMDGPUUsage.rst (+103-306)
``````````diff
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index bb6751038fc9c..7510c4ae644c6 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -5969,6 +5969,31 @@ following sections:
* :ref:`amdgpu-amdhsa-memory-model-gfx942`
* :ref:`amdgpu-amdhsa-memory-model-gfx10-gfx11`
+.. _amdgpu-fence-as:
+
+Fence and Address Spaces
+++++++++++++++++++++++++++++++
+
+LLVM fences do not have address space information, thus, fence
+codegen usually needs to be conservative and fence all address spaces.
+
+In the case of OpenCL, where synchronization can only happen in the
+same address space, this can result in extra unnecessary waits.
+For instance, a fence that is supposed to only target local memory will
+also have to wait on all global memory operations, which is unnecessary.
+
+:doc:`Memory Model Relaxation Annotations <MemoryModelRelaxationAnnotations>` can
+be used as an optimization hint for fences to solve this problem.
+The AMDGPU backend handles the following tags on fences:
+
+- ``amdgpu-as:local`` - fence only the local address space
+- ``amdgpu-as:global``- fence only the global address space
+
+This can avoid unnecessary waiting in many cases. However, those annotations are
+attached using metadata, which can always be dropped by the optimizer when it
+inhibits optimizations, and the cost of not performing that optimization is
+greater than the cost of dropping the metadata.
+
.. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
Memory Model GFX6-GFX9
@@ -6306,21 +6331,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
- If OpenCL and
address space is
not generic, omit.
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Must happen after
any preceding
local/generic load
@@ -6352,14 +6365,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -6562,21 +6570,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
- If OpenCL and
address space is
not generic, omit.
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Must happen after
any preceding
local/generic
@@ -6612,21 +6608,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -6956,14 +6940,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -7904,21 +7883,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- s_waitcnt vmcnt(0)
must happen after
any preceding
@@ -7977,14 +7944,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -8055,14 +8017,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -8430,21 +8387,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- s_waitcnt vmcnt(0)
must happen after
any preceding
@@ -8490,21 +8435,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -8572,21 +8505,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/94335
More information about the llvm-commits
mailing list