[llvm] 802e3f4 - [AMDGPU] Add GFX11 documentation to AMDGPUUsage
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 10 00:15:12 PDT 2022
Author: Tony
Date: 2022-06-10T08:10:34+01:00
New Revision: 802e3f4f57fd20579bfd6375eb926cffca0485af
URL: https://github.com/llvm/llvm-project/commit/802e3f4f57fd20579bfd6375eb926cffca0485af
DIFF: https://github.com/llvm/llvm-project/commit/802e3f4f57fd20579bfd6375eb926cffca0485af.diff
LOG: [AMDGPU] Add GFX11 documentation to AMDGPUUsage
Update most of the document to include GFX11. Memory model changes will
come later.
Differential Revision: https://reviews.llvm.org/D127402
Added:
Modified:
llvm/docs/AMDGPUUsage.rst
Removed:
################################################################################
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 81dad6a487cce..4f51f872fe57b 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -698,7 +698,7 @@ supported for the ``amdgcn`` target.
can be used. For GFX7-GFX8 these are available in the
:ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with
Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For
- GFX9-GFX10 the aperture base addresses are directly available as inline
+ GFX9-GFX11 the aperture base addresses are directly available as inline
constant registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``.
In 64-bit address mode the aperture sizes are 2^32 bytes and the base is
aligned to 2^32 which makes it easier to convert from flat to segment or
@@ -768,7 +768,7 @@ supported for the ``amdgcn`` target.
instructions with the scratch buffer descriptor and per wavefront scratch
offset, by the scratch instructions, or by flat instructions. Multi-dword
access is not supported except by flat and scratch instructions in
- GFX9-GFX10.
+ GFX9-GFX11.
**Constant 32-bit**
*TODO*
@@ -2536,10 +2536,10 @@ DWARF Version 5 section 6.2.4):
AMDGPU does not use a segment selector so this is 0.
``minimum_instruction_length`` (ubyte)
- For GFX9-GFX10 this is 4.
+ For GFX9-GFX11 this is 4.
``maximum_operations_per_instruction`` (ubyte)
- For GFX9-GFX10 this is 1.
+ For GFX9-GFX11 this is 1.
Source text for online-compiled programs (for example, those compiled by the
OpenCL language runtime) may be embedded into the DWARF Version 5 line table.
@@ -3022,7 +3022,7 @@ non-AMD key names should be prefixed by "*vendor-name*.".
"NumSGPRs" integer Required Number of scalar
registers used by a
wavefront for
- GFX6-GFX10. This
+ GFX6-GFX11. This
includes the special
SGPRs for VCC, Flat
Scratch (GFX7-GFX10)
@@ -3038,7 +3038,7 @@ non-AMD key names should be prefixed by "*vendor-name*.".
"NumVGPRs" integer Required Number of vector
registers used by
each work-item for
- GFX6-GFX10
+ GFX6-GFX11
"MaxFlatWorkGroupSize" integer Required Maximum flat
work-group size
supported by the
@@ -3793,10 +3793,10 @@ the scratch buffer descriptor and per wavefront scratch offset, by the scratch
instructions, or by flat instructions. If each lane of a wavefront accesses the
same private address, the interleaving results in adjacent dwords being accessed
and hence requires fewer cache lines to be fetched. Multi-dword access is not
-supported except by flat and scratch instructions in GFX9-GFX10.
+supported except by flat and scratch instructions in GFX9-GFX11.
The generic address space uses the hardware flat address support available in
-GFX7-GFX10. This uses two fixed ranges of virtual addresses (the private and
+GFX7-GFX11. This uses two fixed ranges of virtual addresses (the private and
local apertures), that are outside the range of addressible global memory, to
map from a flat address to a private or local address.
@@ -3812,7 +3812,7 @@ To convert between a segment address and a flat address the base address of the
apertures address can be used. For GFX7-GFX8 these are available in the
:ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with
Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For
-GFX9-GFX10 the aperture base addresses are directly available as inline constant
+GFX9-GFX11 the aperture base addresses are directly available as inline constant
registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``. In 64 bit
address mode the aperture sizes are 2^32 bytes and the base is aligned to 2^32
which makes it easier to convert from flat to segment or segment to flat.
@@ -3936,28 +3936,28 @@ The fields used by CP for code objects before V3 also match those specified in
configuration
register. See
:ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`.
- GFX10
+ GFX10-GFX11
Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC3``
configuration
register. See
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table`.
415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC1``
configuration
register. See
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
447:416 4 bytes COMPUTE_PGM_RSRC2 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC2``
configuration
register. See
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
458:448 7 bits *See separate bits below.* Enable the setup of the
SGPR user data registers
(see
@@ -3991,7 +3991,7 @@ The fields used by CP for code objects before V3 also match those specified in
457:455 3 bits Reserved, must be 0.
458 1 bit ENABLE_WAVEFRONT_SIZE32 GFX6-GFX9
Reserved, must be 0.
- GFX10
+ GFX10-GFX11
- If 0 execute in
wavefront size 64 mode.
- If 1 execute in
@@ -4008,8 +4008,8 @@ The fields used by CP for code objects before V3 also match those specified in
..
- .. table:: compute_pgm_rsrc1 for GFX6-GFX10
- :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table
+ .. table:: compute_pgm_rsrc1 for GFX6-GFX11
+ :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
@@ -4027,10 +4027,10 @@ The fields used by CP for code objects before V3 also match those specified in
- vgprs_used = align(arch_vgprs, 4)
+ acc_vgprs
- max(0, ceil(vgprs_used / 8) - 1)
- GFX10 (wavefront size 64)
+ GFX10-GFX11 (wavefront size 64)
- max_vgpr 1..256
- max(0, ceil(vgprs_used / 4) - 1)
- GFX10 (wavefront size 32)
+ GFX10-GFX11 (wavefront size 32)
- max_vgpr 1..256
- max(0, ceil(vgprs_used / 8) - 1)
@@ -4064,7 +4064,7 @@ The fields used by CP for code objects before V3 also match those specified in
GFX9
- sgprs_used 0..112
- 2 * max(0, ceil(sgprs_used / 16) - 1)
- GFX10
+ GFX10-GFX11
Reserved, must be 0.
(128 SGPRs always
allocated.)
@@ -4231,7 +4231,7 @@ The fields used by CP for code objects before V3 also match those specified in
``COMPUTE_PGM_RSRC1.CDBG_USER``.
26 1 bit FP16_OVFL GFX6-GFX8
Reserved, must be 0.
- GFX9-GFX10
+ GFX9-GFX11
Wavefront starts execution
with specified fp16 overflow
mode.
@@ -4250,7 +4250,7 @@ The fields used by CP for code objects before V3 also match those specified in
28:27 2 bits Reserved, must be 0.
29 1 bit WGP_MODE GFX6-GFX9
Reserved, must be 0.
- GFX10
+ GFX10-GFX11
- If 0 execute work-groups in
CU wavefront execution mode.
- If 1 execute work-groups on
@@ -4262,7 +4262,7 @@ The fields used by CP for code objects before V3 also match those specified in
``COMPUTE_PGM_RSRC1.WGP_MODE``.
30 1 bit MEM_ORDERED GFX6-GFX9
Reserved, must be 0.
- GFX10
+ GFX10-GFX11
Controls the behavior of the
s_waitcnt's vmcnt and vscnt
counters.
@@ -4285,7 +4285,7 @@ The fields used by CP for code objects before V3 also match those specified in
``COMPUTE_PGM_RSRC1.MEM_ORDERED``.
31 1 bit FWD_PROGRESS GFX6-GFX9
Reserved, must be 0.
- GFX10
+ GFX10-GFX11
- If 0 execute SIMD wavefronts
using oldest first policy.
- If 1 execute SIMD wavefronts to
@@ -4299,8 +4299,8 @@ The fields used by CP for code objects before V3 also match those specified in
..
- .. table:: compute_pgm_rsrc2 for GFX6-GFX10
- :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table
+ .. table:: compute_pgm_rsrc2 for GFX6-GFX11
+ :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
@@ -4437,7 +4437,7 @@ The fields used by CP for code objects before V3 also match those specified in
GFX6
roundup(lds-size / (64 * 4))
- GFX7-GFX10
+ GFX7-GFX11
roundup(lds-size / (128 * 4))
24 1 bit ENABLE_EXCEPTION_IEEE_754_FP Wavefront starts execution
@@ -4493,8 +4493,8 @@ The fields used by CP for code objects before V3 also match those specified in
..
- .. table:: compute_pgm_rsrc3 for GFX10
- :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-table
+ .. table:: compute_pgm_rsrc3 for GFX10-GFX11
+ :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
@@ -4503,8 +4503,43 @@ The fields used by CP for code objects before V3 also match those specified in
wavefront size 64 the value is 0-15, representing 0-120 VGPRs (granularity
of 8), such that (compute_pgm_rsrc1.vgprs +1)*4 + shared_vgpr_count*8 does
not exceed 256. For wavefront size 32 shared_vgpr_count must be 0.
- 31:4 28 Reserved, must be 0.
- bits
+ 9:4 6 bits INST_PREF_SIZE GFX10
+ Reserved, must be 0.
+ GFX11
+ Number of instruction bytes to prefetch, starting at the kernel's entry
+ point instruction, before wavefront starts execution. The value is 0..63
+ with a granularity of 128 bytes.
+ 10 1 bit TRAP_ON_START GFX10
+ Reserved, must be 0.
+ GFX11
+ Must be 0.
+
+ If 1, wavefront starts execution by trapping into the trap handler.
+
+ CP is responsible for filling in the trap on start bit in
+ ``COMPUTE_PGM_RSRC3.TRAP_ON_START`` according to what the runtime
+ requests.
+ 11 1 bit TRAP_ON_END GFX10
+ Reserved, must be 0.
+ GFX11
+ Must be 0.
+
+ If 1, wavefront execution terminates by trapping into the trap handler.
+
+ CP is responsible for filling in the trap on end bit in
+ ``COMPUTE_PGM_RSRC3.TRAP_ON_END`` according to what the runtime requests.
+ 30:12 19 bits Reserved, must be 0.
+ 31 1 bit IMAGE_OP GFX10
+ Reserved, must be 0.
+ GFX11
+ If 1, the kernel execution contains image instructions. If executed as
+ part of a graphics pipeline, image read instructions will stall waiting
+ for any necessary ``WAIT_SYNC`` fence to be performed in order to
+ indicate that earlier pipeline stages have completed writing to the
+ image.
+
+ Not used for compute kernels that are not part of a graphics pipeline and
+ must be 0.
32 **Total size 4 bytes.**
======= ===================================================================================================================
@@ -4630,7 +4665,7 @@ SGPR register initial state is defined in
GFX7-GFX8 since it is the same
value as the second SGPR of
Flat Scratch Init. However, it
- may be needed for GFX9-GFX10 which
+ may be needed for GFX9-GFX11 which
changes the meaning of the
Flat Scratch Init value.
then Work-Group Id X 1 32-bit work-group id in X
@@ -4732,8 +4767,8 @@ The setting of registers is done by GPU CP/ADC/SPI hardware as follows:
:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
The global segment can be accessed either using buffer instructions (GFX6 which
-has V# 64-bit address support), flat instructions (GFX7-GFX10), or global
-instructions (GFX9-GFX10).
+has V# 64-bit address support), flat instructions (GFX7-GFX11), or global
+instructions (GFX9-GFX11).
If buffer operations are used, then the compiler can generate a V# with the
following properties:
@@ -4779,7 +4814,7 @@ GFX6-GFX8
available in dispatch packet. For M0, it is also possible to use maximum
possible value of LDS for given target (0x7FFF for GFX6 and 0xFFFF for
GFX7-GFX8).
-GFX9-GFX10
+GFX9-GFX11
The M0 register is not used for range checking LDS accesses and so does not
need to be initialized in the prolog.
@@ -4891,7 +4926,7 @@ There are
diff erent methods used for initializing flat scratch:
specifies *Architected flat scratch*:
If ENABLE_PRIVATE_SEGMENT is enabled in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table` then the FLAT_SCRATCH
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table` then the FLAT_SCRATCH
register pair will be initialized to the 64-bit address of the base of scratch
backing memory being managed by SPI for the queue executing the kernel
dispatch plus the value of the wave's Scratch Wavefront Offset for use as the
@@ -5000,7 +5035,7 @@ operations.
termed vector memory operations.
Private address space uses ``buffer_load/store`` using the scratch V#
-(GFX6-GFX8), or ``scratch_load/store`` (GFX9-GFX10). Since only a single thread
+(GFX6-GFX8), or ``scratch_load/store`` (GFX9-GFX11). Since only a single thread
is accessing the memory, atomic memory orderings are not meaningful, and all
accesses are treated as non-atomic.
@@ -5067,7 +5102,7 @@ following sections:
* :ref:`amdgpu-amdhsa-memory-model-gfx6-gfx9`
* :ref:`amdgpu-amdhsa-memory-model-gfx90a`
* :ref:`amdgpu-amdhsa-memory-model-gfx940`
-* :ref:`amdgpu-amdhsa-memory-model-gfx10`
+* :ref:`amdgpu-amdhsa-memory-model-gfx10-gfx11`
.. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
@@ -11137,12 +11172,12 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
- system for OpenCL.*
============ ============ ============== ========== ================================
-.. _amdgpu-amdhsa-memory-model-gfx10:
+.. _amdgpu-amdhsa-memory-model-gfx10-gfx11:
-Memory Model GFX10
-++++++++++++++++++
+Memory Model GFX10-GFX11
+++++++++++++++++++++++++
-For GFX10:
+For GFX10-GFX11:
* Each agent has multiple shader arrays (SA).
* Each SA has multiple work-group processors (WGP).
@@ -11251,7 +11286,7 @@ Wavefronts are executed in native mode with in-order reporting of loads and
sample instructions. In this mode vmcnt reports completion of load, atomic with
return and sample instructions in order, and the vscnt reports the completion of
store and atomic without return in order. See ``MEM_ORDERED`` field in
-:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
Wavefronts can be executed in WGP or CU wavefront execution mode:
@@ -11267,18 +11302,18 @@ Wavefronts can be executed in WGP or CU wavefront execution mode:
work-group synchronization.
See ``WGP_MODE`` field in
-:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table` and
+:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table` and
:ref:`amdgpu-target-features`.
-The code sequences used to implement the memory model for GFX10 are defined in
-table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-table`.
+The code sequences used to implement the memory model for GFX10-GFX11 are defined in
+table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`.
- .. table:: AMDHSA Memory Model Code Sequences GFX10
- :name: amdgpu-amdhsa-memory-model-code-sequences-gfx10-table
+ .. table:: AMDHSA Memory Model Code Sequences GFX10-GFX11
+ :name: amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table
============ ============ ============== ========== ================================
LLVM Instr LLVM Memory LLVM Memory AMDGPU AMDGPU Machine Code
- Ordering Sync Scope Address GFX10
+ Ordering Sync Scope Address GFX10-GFX11
Space
============ ============ ============== ========== ================================
**Non-Atomic**
@@ -13308,7 +13343,7 @@ supports the ``s_trap`` instruction. For usage see:
:name: amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table
=================== =============== ================ ================= =======================================
- Usage Code Sequence GFX6-GFX8 Inputs GFX9-GFX10 Inputs Description
+ Usage Code Sequence GFX6-GFX8 Inputs GFX9-GFX11 Inputs Description
=================== =============== ================ ================= =======================================
reserved ``s_trap 0x00`` Reserved by hardware.
debugger breakpoint ``s_trap 0x01`` *none* *none* Reserved for debugger to use for
@@ -14206,7 +14241,7 @@ Assembler
---------
AMDGPU backend has LLVM-MC based assembler which is currently in development.
-It supports AMDGCN GFX6-GFX10.
+It supports AMDGCN GFX6-GFX11.
This section describes general syntax for instructions and operands.
@@ -14766,121 +14801,119 @@ terminated by an ``.end_amdhsa_kernel`` directive.
======================================================== =================== ============ ===================
Directive Default Supported On Description
======================================================== =================== ============ ===================
- ``.amdhsa_group_segment_fixed_size`` 0 GFX6-GFX10 Controls GROUP_SEGMENT_FIXED_SIZE in
+ ``.amdhsa_group_segment_fixed_size`` 0 GFX6-GFX11 Controls GROUP_SEGMENT_FIXED_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_private_segment_fixed_size`` 0 GFX6-GFX10 Controls PRIVATE_SEGMENT_FIXED_SIZE in
+ ``.amdhsa_private_segment_fixed_size`` 0 GFX6-GFX11 Controls PRIVATE_SEGMENT_FIXED_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_kernarg_size`` 0 GFX6-GFX10 Controls KERNARG_SIZE in
+ ``.amdhsa_kernarg_size`` 0 GFX6-GFX11 Controls KERNARG_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_count`` 0 GFX6-GFX10 Controls USER_SGPR_COUNT in COMPUTE_PGM_RSRC2
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`
+ ``.amdhsa_user_sgpr_count`` 0 GFX6-GFX11 Controls USER_SGPR_COUNT in COMPUTE_PGM_RSRC2
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`
``.amdhsa_user_sgpr_private_segment_buffer`` 0 GFX6-GFX10 Controls ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER in
(except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
GFX940)
- ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX10 Controls ENABLE_SGPR_DISPATCH_PTR in
+ ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_DISPATCH_PTR in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX10 Controls ENABLE_SGPR_QUEUE_PTR in
+ ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_QUEUE_PTR in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_kernarg_segment_ptr`` 0 GFX6-GFX10 Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in
+ ``.amdhsa_user_sgpr_kernarg_segment_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_dispatch_id`` 0 GFX6-GFX10 Controls ENABLE_SGPR_DISPATCH_ID in
+ ``.amdhsa_user_sgpr_dispatch_id`` 0 GFX6-GFX11 Controls ENABLE_SGPR_DISPATCH_ID in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_user_sgpr_flat_scratch_init`` 0 GFX6-GFX10 Controls ENABLE_SGPR_FLAT_SCRATCH_INIT in
(except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
GFX940)
- ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX10 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
+ ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX11 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_wavefront_size32`` Target GFX10 Controls ENABLE_WAVEFRONT_SIZE32 in
+ ``.amdhsa_wavefront_size32`` Target GFX10-GFX11 Controls ENABLE_WAVEFRONT_SIZE32 in
Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
Specific
(wavefrontsize64)
``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0 GFX6-GFX10 Controls ENABLE_PRIVATE_SEGMENT in
- (except :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+ (except :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
GFX940)
- ``.amdhsa_enable_private_segment`` 0 GFX940 Controls ENABLE_PRIVATE_SEGMENT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
- ``.amdhsa_system_sgpr_workgroup_id_x`` 1 GFX6-GFX10 Controls ENABLE_SGPR_WORKGROUP_ID_X in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
- ``.amdhsa_system_sgpr_workgroup_id_y`` 0 GFX6-GFX10 Controls ENABLE_SGPR_WORKGROUP_ID_Y in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
- ``.amdhsa_system_sgpr_workgroup_id_z`` 0 GFX6-GFX10 Controls ENABLE_SGPR_WORKGROUP_ID_Z in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
- ``.amdhsa_system_sgpr_workgroup_info`` 0 GFX6-GFX10 Controls ENABLE_SGPR_WORKGROUP_INFO in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
- ``.amdhsa_system_vgpr_workitem_id`` 0 GFX6-GFX10 Controls ENABLE_VGPR_WORKITEM_ID in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+ ``.amdhsa_enable_private_segment`` 0 GFX940, Controls ENABLE_PRIVATE_SEGMENT in
+ GFX11 :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_system_sgpr_workgroup_id_x`` 1 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_X in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_system_sgpr_workgroup_id_y`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_Y in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_system_sgpr_workgroup_id_z`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_Z in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_system_sgpr_workgroup_info`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_INFO in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_system_vgpr_workitem_id`` 0 GFX6-GFX11 Controls ENABLE_VGPR_WORKITEM_ID in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
Possible values are defined in
:ref:`amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table`.
- ``.amdhsa_next_free_vgpr`` Required GFX6-GFX10 Maximum VGPR number explicitly referenced, plus one.
+ ``.amdhsa_next_free_vgpr`` Required GFX6-GFX11 Maximum VGPR number explicitly referenced, plus one.
Used to calculate GRANULATED_WORKITEM_VGPR_COUNT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
- ``.amdhsa_next_free_sgpr`` Required GFX6-GFX10 Maximum SGPR number explicitly referenced, plus one.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ ``.amdhsa_next_free_sgpr`` Required GFX6-GFX11 Maximum SGPR number explicitly referenced, plus one.
Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
``.amdhsa_accum_offset`` Required GFX90A, Offset of a first AccVGPR in the unified register file.
GFX940 Used to calculate ACCUM_OFFSET in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`.
- ``.amdhsa_reserve_vcc`` 1 GFX6-GFX10 Whether the kernel may use the special VCC SGPR.
+ ``.amdhsa_reserve_vcc`` 1 GFX6-GFX11 Whether the kernel may use the special VCC SGPR.
Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
``.amdhsa_reserve_flat_scratch`` 1 GFX7-GFX10 Whether the kernel may use flat instructions to access
(except scratch memory. Used to calculate
GFX940) GRANULATED_WAVEFRONT_SGPR_COUNT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
``.amdhsa_reserve_xnack_mask`` Target GFX8-GFX10 Whether the kernel may trigger XNACK replay.
Feature Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
- Specific :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+ Specific :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
(xnack)
- ``.amdhsa_float_round_mode_32`` 0 GFX6-GFX10 Controls FLOAT_ROUND_MODE_32 in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+ ``.amdhsa_float_round_mode_32`` 0 GFX6-GFX11 Controls FLOAT_ROUND_MODE_32 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
Possible values are defined in
:ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
- ``.amdhsa_float_round_mode_16_64`` 0 GFX6-GFX10 Controls FLOAT_ROUND_MODE_16_64 in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+ ``.amdhsa_float_round_mode_16_64`` 0 GFX6-GFX11 Controls FLOAT_ROUND_MODE_16_64 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
Possible values are defined in
:ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
- ``.amdhsa_float_denorm_mode_32`` 0 GFX6-GFX10 Controls FLOAT_DENORM_MODE_32 in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+ ``.amdhsa_float_denorm_mode_32`` 0 GFX6-GFX11 Controls FLOAT_DENORM_MODE_32 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
Possible values are defined in
:ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
- ``.amdhsa_float_denorm_mode_16_64`` 3 GFX6-GFX10 Controls FLOAT_DENORM_MODE_16_64 in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+ ``.amdhsa_float_denorm_mode_16_64`` 3 GFX6-GFX11 Controls FLOAT_DENORM_MODE_16_64 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
Possible values are defined in
:ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
- ``.amdhsa_dx10_clamp`` 1 GFX6-GFX10 Controls ENABLE_DX10_CLAMP in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
- ``.amdhsa_ieee_mode`` 1 GFX6-GFX10 Controls ENABLE_IEEE_MODE in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
- ``.amdhsa_fp16_overflow`` 0 GFX9-GFX10 Controls FP16_OVFL in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+ ``.amdhsa_dx10_clamp`` 1 GFX6-GFX11 Controls ENABLE_DX10_CLAMP in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ ``.amdhsa_ieee_mode`` 1 GFX6-GFX11 Controls ENABLE_IEEE_MODE in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ ``.amdhsa_fp16_overflow`` 0 GFX9-GFX11 Controls FP16_OVFL in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
``.amdhsa_tg_split`` Target GFX90A, Controls TG_SPLIT in
- Feature GFX940 :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`.
- Specific
+ Feature GFX940, :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`.
+ Specific GFX11
(tgsplit)
- ``.amdhsa_workgroup_processor_mode`` Target GFX10 Controls ENABLE_WGP_MODE in
+ ``.amdhsa_workgroup_processor_mode`` Target GFX10-GFX11 Controls ENABLE_WGP_MODE in
Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
Specific
(cumode)
- ``.amdhsa_memory_ordered`` 1 GFX10 Controls MEM_ORDERED in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
- ``.amdhsa_forward_progress`` 0 GFX10 Controls FWD_PROGRESS in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
- ``.amdhsa_shared_vgpr_count`` 0 GFX10 Controls SHARED_VGPR_COUNT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-table`.
- ``.amdhsa_exception_fp_ieee_invalid_op`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
- ``.amdhsa_exception_fp_denorm_src`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
- ``.amdhsa_exception_fp_ieee_div_zero`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
- ``.amdhsa_exception_fp_ieee_overflow`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
- ``.amdhsa_exception_fp_ieee_underflow`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
- ``.amdhsa_exception_fp_ieee_inexact`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_IEEE_754_FP_INEXACT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
- ``.amdhsa_exception_int_div_zero`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+ ``.amdhsa_memory_ordered`` 1 GFX10-GFX11 Controls MEM_ORDERED in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ ``.amdhsa_forward_progress`` 0 GFX10-GFX11 Controls FWD_PROGRESS in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ ``.amdhsa_exception_fp_ieee_invalid_op`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_exception_fp_denorm_src`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_exception_fp_ieee_div_zero`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_exception_fp_ieee_overflow`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_exception_fp_ieee_underflow`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_exception_fp_ieee_inexact`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_INEXACT in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_exception_int_div_zero`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
======================================================== =================== ============ ===================
.amdgpu_metadata
More information about the llvm-commits
mailing list