[llvm-branch-commits] [llvm] [AMDGPU][docs] Replace gfx940 and gfx941 with gfx942 in llvm/docs (PR #126887)
via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Wed Feb 12 02:53:07 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-amdgpu
Author: Fabian Ritter (ritter-x2a)
<details>
<summary>Changes</summary>
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all documentation occurrences of gfx940/gfx941 except
for the gfx940 ISA description, which will be the subject of a separate
PR.
For SWDEV-512631
---
Patch is 24.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/126887.diff
2 Files Affected:
- (modified) llvm/docs/AMDGPUOperandSyntax.rst (+2-2)
- (modified) llvm/docs/AMDGPUUsage.rst (+32-65)
``````````diff
diff --git a/llvm/docs/AMDGPUOperandSyntax.rst b/llvm/docs/AMDGPUOperandSyntax.rst
index ff6ec6cf71ff2..e8a76322fe76a 100644
--- a/llvm/docs/AMDGPUOperandSyntax.rst
+++ b/llvm/docs/AMDGPUOperandSyntax.rst
@@ -63,7 +63,7 @@ Note: *N* and *K* must satisfy the following conditions:
* 0 <= *K* <= 255.
* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
-GFX90A and GFX940 have an additional alignment requirement:
+GFX90A and GFX942 have an additional alignment requirement:
pairs of *vector* registers must be even-aligned
(first register must be even).
@@ -183,7 +183,7 @@ Note: *N* and *K* must satisfy the following conditions:
* 0 <= *K* <= 255.
* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
-GFX90A and GFX940 have an additional alignment requirement:
+GFX90A and GFX942 have an additional alignment requirement:
pairs of *accumulator* registers must be even-aligned
(first register must be even).
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 83ec1eecb6e5e..14b3b6fce9e70 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -323,7 +323,7 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
Add product
names.
- **GCN GFX9 (Vega)** [AMD-GCN-GFX900-GFX904-VEGA]_ [AMD-GCN-GFX906-VEGA7NM]_ [AMD-GCN-GFX908-CDNA1]_ [AMD-GCN-GFX90A-CDNA2]_ [AMD-GCN-GFX940-GFX942-CDNA3]_
+ **GCN GFX9 (Vega)** [AMD-GCN-GFX900-GFX904-VEGA]_ [AMD-GCN-GFX906-VEGA7NM]_ [AMD-GCN-GFX908-CDNA1]_ [AMD-GCN-GFX90A-CDNA2]_ [AMD-GCN-GFX942-CDNA3]_
-----------------------------------------------------------------------------------------------------------------------
``gfx900`` ``amdgcn`` dGPU - xnack - Absolute - *rocm-amdhsa* - Radeon Vega
flat - *pal-amdhsa* Frontier Edition
@@ -378,20 +378,6 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
- Ryzen 3 Pro 4350G
- Ryzen 3 Pro 4350GE
- ``gfx940`` ``amdgcn`` dGPU - sramecc - Architected *TBA*
- - tgsplit flat
- - xnack scratch .. TODO::
- - kernarg preload - Packed
- work-item Add product
- IDs names.
-
- ``gfx941`` ``amdgcn`` dGPU - sramecc - Architected *TBA*
- - tgsplit flat
- - xnack scratch .. TODO::
- - kernarg preload - Packed
- work-item Add product
- IDs names.
-
``gfx942`` ``amdgcn`` dGPU - sramecc - Architected - AMD Instinct MI300X
- tgsplit flat - AMD Instinct MI300A
- xnack scratch
@@ -583,10 +569,10 @@ Generic processor code objects are versioned. See :ref:`amdgpu-generic-processor
- ``v_dot2_f32_f16``
- ``gfx9-4-generic`` ``amdgcn`` - ``gfx940`` - sramecc - Architected FP8 and BF8 instructions,
- - ``gfx941`` - tgsplit flat scratch FP8 and BF8 conversion
- - ``gfx942`` - xnack - Packed instructions, as well as
- - ``gfx950`` - kernarg preload work-item instructions with XF32 format
+ ``gfx9-4-generic`` ``amdgcn`` - ``gfx942`` - sramecc - Architected FP8 and BF8 instructions,
+ - ``gfx950`` - tgsplit flat scratch FP8 and BF8 conversion
+ - xnack - Packed instructions, as well as
+ - kernarg preload work-item instructions with XF32 format
IDs support are not available.
``gfx10-1-generic`` ``amdgcn`` - ``gfx1010`` - xnack - Absolute flat - The following instructions are
@@ -4974,7 +4960,7 @@ The fields used by CP for code objects before V3 also match those specified in
bytes
383:352 4 bytes COMPUTE_PGM_RSRC3 GFX6-GFX9
Reserved, must be 0.
- GFX90A, GFX940
+ GFX90A, GFX942
Compute Shader (CS)
program settings used by
CP to set up
@@ -5059,7 +5045,7 @@ The fields used by CP for code objects before V3 also match those specified in
463:460 4 bits Reserved, must be 0.
470:464 7 bits KERNARG_PRELOAD_SPEC_LENGTH GFX6-GFX9
- Reserved, must be 0.
- GFX90A, GFX940
+ GFX90A, GFX942
- The number of dwords from
the kernarg segment to preload
into User SGPRs before kernel
@@ -5067,7 +5053,7 @@ The fields used by CP for code objects before V3 also match those specified in
:ref:`amdgpu-amdhsa-kernarg-preload`).
479:471 9 bits KERNARG_PRELOAD_SPEC_OFFSET GFX6-GFX9
- Reserved, must be 0.
- GFX90A, GFX940
+ GFX90A, GFX942
- An offset in dwords into the
kernarg segment to begin
preloading data into User
@@ -5093,7 +5079,7 @@ The fields used by CP for code objects before V3 also match those specified in
GFX6-GFX9
- vgprs_used 0..256
- max(0, ceil(vgprs_used / 4) - 1)
- GFX90A, GFX940
+ GFX90A, GFX942
- vgprs_used 0..512
- vgprs_used = align(arch_vgprs, 4)
+ acc_vgprs
@@ -5559,7 +5545,7 @@ The fields used by CP for code objects before V3 also match those specified in
..
- .. table:: compute_pgm_rsrc3 for GFX90A, GFX940
+ .. table:: compute_pgm_rsrc3 for GFX90A, GFX942
:name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table
======= ======= =============================== ===========================================================================
@@ -9970,15 +9956,15 @@ only accessed by a single thread, and is always write-before-read, there is
never a need to invalidate these entries from the L1 cache. Hence all cache
invalidates are done as ``*_vol`` to only invalidate the volatile cache lines.
-The code sequences used to implement the memory model for GFX940, GFX941, GFX942
-are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx941-gfx942-table`.
+The code sequences used to implement the memory model for GFX942 are defined in
+table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx942-table`.
- .. table:: AMDHSA Memory Model Code Sequences GFX940, GFX941, GFX942
- :name: amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx941-gfx942-table
+ .. table:: AMDHSA Memory Model Code Sequences GFX942
+ :name: amdgpu-amdhsa-memory-model-code-sequences-gfx942-table
============ ============ ============== ========== ================================
LLVM Instr LLVM Memory LLVM Memory AMDGPU AMDGPU Machine Code
- Ordering Sync Scope Address GFX940, GFX941, GFX942
+ Ordering Sync Scope Address GFX942
Space
============ ============ ============== ========== ================================
**Non-Atomic**
@@ -10013,18 +9999,12 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
load *none* *none* - local 1. ds_load
store *none* *none* - global - !volatile & !nontemporal
- generic
- - private 1. GFX940, GFX941
+ - private 1. GFX942
- constant buffer/global/flat_store
- sc0=1 sc1=1
- GFX942
- buffer/global/flat_store
- !volatile & nontemporal
- 1. GFX940, GFX941
- buffer/global/flat_store
- nt=1 sc0=1 sc1=1
- GFX942
+ 1. GFX942
buffer/global/flat_store
nt=1
@@ -10696,11 +10676,8 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
**Release Atomic**
------------------------------------------------------------------------------------
- store atomic release - singlethread - global 1. GFX940, GFX941
+ store atomic release - singlethread - global 1. GFX942
- wavefront - generic buffer/global/flat_store
- sc0=1 sc1=1
- GFX942
- buffer/global/flat_store
store atomic release - singlethread - local *If TgSplit execution mode,
- wavefront local address space cannot
@@ -10738,10 +10715,7 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
store that is being
released.
- 2. GFX940, GFX941
- buffer/global/flat_store
- sc0=1 sc1=1
- GFX942
+ 2. GFX942
buffer/global/flat_store
sc0=1
store atomic release - workgroup - local *If TgSplit execution mode,
@@ -10802,10 +10776,7 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
store that is being
released.
- 3. GFX940, GFX941
- buffer/global/flat_store
- sc0=1 sc1=1
- GFX942
+ 3. GFX942
buffer/global/flat_store
sc1=1
store atomic release - system - global 1. buffer_wbl2 sc0=1 sc1=1
@@ -17563,11 +17534,7 @@ in this description.
CDNA 2 :doc:`GFX9<AMDGPU/AMDGPUAsmGFX9>` :doc:`gfx90a<AMDGPU/AMDGPUAsmGFX90a>`
- CDNA 3 :doc:`GFX9<AMDGPU/AMDGPUAsmGFX9>` :doc:`gfx940<AMDGPU/AMDGPUAsmGFX940>`
-
- :doc:`gfx941<AMDGPU/AMDGPUAsmGFX940>`
-
- :doc:`gfx942<AMDGPU/AMDGPUAsmGFX940>`
+ CDNA 3 :doc:`GFX9<AMDGPU/AMDGPUAsmGFX9>` :doc:`gfx942<AMDGPU/AMDGPUAsmGFX940>`
RDNA 1 :doc:`GFX10 RDNA1<AMDGPU/AMDGPUAsmGFX10>` :doc:`gfx1010<AMDGPU/AMDGPUAsmGFX10>`
@@ -17605,7 +17572,7 @@ combinations of operands, refer to one of instruction set architecture manuals
[AMD-GCN-GFX6]_, [AMD-GCN-GFX7]_, [AMD-GCN-GFX8]_,
[AMD-GCN-GFX900-GFX904-VEGA]_, [AMD-GCN-GFX906-VEGA7NM]_,
[AMD-GCN-GFX908-CDNA1]_, [AMD-GCN-GFX90A-CDNA2]_,
-[AMD-GCN-GFX940-GFX942-CDNA3]_, [AMD-GCN-GFX10-RDNA1]_, [AMD-GCN-GFX10-RDNA2]_,
+[AMD-GCN-GFX942-CDNA3]_, [AMD-GCN-GFX10-RDNA1]_, [AMD-GCN-GFX10-RDNA2]_,
[AMD-GCN-GFX11-RDNA3]_ and [AMD-GCN-GFX11-RDNA3.5]_.
Operands
@@ -18118,7 +18085,7 @@ terminated by an ``.end_amdhsa_kernel`` directive.
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`
``.amdhsa_user_sgpr_private_segment_buffer`` 0 GFX6-GFX10 Controls ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER in
(except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- GFX940)
+ GFX942)
``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX12 Controls ENABLE_SGPR_DISPATCH_PTR in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX12 Controls ENABLE_SGPR_QUEUE_PTR in
@@ -18129,7 +18096,7 @@ terminated by an ``.end_amdhsa_kernel`` directive.
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_user_sgpr_flat_scratch_init`` 0 GFX6-GFX10 Controls ENABLE_SGPR_FLAT_SCRATCH_INIT in
(except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- GFX940)
+ GFX942)
``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in
@@ -18140,8 +18107,8 @@ terminated by an ``.end_amdhsa_kernel`` directive.
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0 GFX6-GFX10 Controls ENABLE_PRIVATE_SEGMENT in
(except :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
- GFX940)
- ``.amdhsa_enable_private_segment`` 0 GFX940, Controls ENABLE_PRIVATE_SEGMENT in
+ GFX942)
+ ``.amdhsa_enable_private_segment`` 0 GFX942, Controls ENABLE_PRIVATE_SEGMENT in
GFX11-GFX12 :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
``.amdhsa_system_sgpr_workgroup_id_x`` 1 GFX6-GFX12 Controls ENABLE_SGPR_WORKGROUP_ID_X in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
@@ -18162,14 +18129,14 @@ terminated by an ``.end_amdhsa_kernel`` directive.
Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
``.amdhsa_accum_offset`` Required GFX90A, Offset of a first AccVGPR in the unified register file.
- GFX940 Used to calculate ACCUM_OFFSET in
+ GFX942 Used to calculate ACCUM_OFFSET in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`.
``.amdhsa_reserve_vcc`` 1 GFX6-GFX12 Whether the kernel may use the special VCC SGPR.
Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
``.amdhsa_reserve_flat_scratch`` 1 GFX7-GFX10 Whether the kernel may use flat instructions to access
...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/126887
More information about the llvm-branch-commits
mailing list