[llvm] a45080f - [AMDGPU] Document amdgpu-as in AMDGPUUsage (#94335)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 11 05:31:30 PDT 2024
Author: Pierre van Houtryve
Date: 2024-06-11T14:31:26+02:00
New Revision: a45080f09181517c9c5eb5099a6b6ac67a48424a
URL: https://github.com/llvm/llvm-project/commit/a45080f09181517c9c5eb5099a6b6ac67a48424a
DIFF: https://github.com/llvm/llvm-project/commit/a45080f09181517c9c5eb5099a6b6ac67a48424a.diff
LOG: [AMDGPU] Document amdgpu-as in AMDGPUUsage (#94335)
Add a section about fence & address spaces that covers amdgpu-as.
Added:
Modified:
llvm/docs/AMDGPUUsage.rst
Removed:
################################################################################
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index aa50ce329d1de..b7ec1b51ee247 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -5980,6 +5980,33 @@ following sections:
* :ref:`amdgpu-amdhsa-memory-model-gfx942`
* :ref:`amdgpu-amdhsa-memory-model-gfx10-gfx11`
+.. _amdgpu-fence-as:
+
+Fence and Address Spaces
+++++++++++++++++++++++++++++++
+
+LLVM fences do not have address space information, thus, fence
+codegen usually needs to conservatively synchronize all address spaces.
+
+In the case of OpenCL, where fences only need to synchronize
+user-specified address spaces, this can result in extra unnecessary waits.
+For instance, a fence that is supposed to only synchronize local memory will
+also have to wait on all global memory operations, which is unnecessary.
+
+:doc:`Memory Model Relaxation Annotations <MemoryModelRelaxationAnnotations>` can
+be used as an optimization hint for fences to solve this problem.
+The AMDGPU backend recognizes the following tags on fences:
+
+- ``amdgpu-as:local`` - fence only the local address space
+- ``amdgpu-as:global``- fence only the global address space
+
+.. note::
+
+ As an optimization hint, those tags are not guaranteed to survive until
+ code generation. Optimizations are free to drop the tags to allow for
+ better code optimization, at the cost of synchronizing additional address
+ spaces.
+
.. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
Memory Model GFX6-GFX9
@@ -6317,21 +6344,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
- If OpenCL and
address space is
not generic, omit.
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Must happen after
any preceding
local/generic load
@@ -6363,14 +6378,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -6573,21 +6583,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
- If OpenCL and
address space is
not generic, omit.
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Must happen after
any preceding
local/generic
@@ -6623,21 +6621,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -6967,14 +6953,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -7915,21 +7896,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- s_waitcnt vmcnt(0)
must happen after
any preceding
@@ -7988,14 +7957,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -8066,14 +8030,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -8441,21 +8400,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- s_waitcnt vmcnt(0)
must happen after
any preceding
@@ -8501,21 +8448,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -8583,21 +8518,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -9218,14 +9141,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -9327,14 +9245,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -10290,21 +10203,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- s_waitcnt vmcnt(0)
must happen after
any preceding
@@ -10363,14 +10264,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -10441,14 +10337,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -10847,21 +10738,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- s_waitcnt vmcnt(0)
must happen after
any preceding
@@ -10920,21 +10799,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -10999,21 +10866,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -11662,14 +11517,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -11771,14 +11621,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -12624,21 +12469,9 @@ table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`.
address space is
local, omit
vmcnt(0) and vscnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0), s_waitcnt
@@ -12721,14 +12554,9 @@ table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`.
address space is
local, omit
vmcnt(0) and vscnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0), s_waitcnt
@@ -13092,21 +12920,9 @@ table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`.
address space is
local, omit
vmcnt(0) and vscnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0), s_waitcnt
@@ -13165,21 +12981,9 @@ table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`.
address space is
local, omit
vmcnt(0) and vscnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0), s_waitcnt
@@ -13731,14 +13535,9 @@ table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`.
address space is
local, omit
vmcnt(0) and vscnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0), s_waitcnt
More information about the llvm-commits
mailing list