[llvm-branch-commits] [llvm] 8c6d516 - [NFC][AMDGPU] Reorganize description of scratch handling
via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Dec 17 11:38:38 PST 2020
Author: Tony
Date: 2020-12-17T19:33:14Z
New Revision: 8c6d516286d5eb51899f380526c59e8b7af69f24
URL: https://github.com/llvm/llvm-project/commit/8c6d516286d5eb51899f380526c59e8b7af69f24
DIFF: https://github.com/llvm/llvm-project/commit/8c6d516286d5eb51899f380526c59e8b7af69f24.diff
LOG: [NFC][AMDGPU] Reorganize description of scratch handling
Differential Revision: https://reviews.llvm.org/D93440
Added:
Modified:
llvm/docs/AMDGPUUsage.rst
Removed:
################################################################################
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index c8dda47352ab..3dbdfa7764dc 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -107,21 +107,21 @@ specific information.
.. table:: AMDGPU Processors
:name: amdgpu-processor-table
- =========== =============== ============ ===== ================= =========== =============== ======================
- Processor Alternative Target dGPU/ Target Target OS Support Example
- Processor Triple APU Features Properties *(see* Products
- Architecture Supported `amdgpu-os`_
- *and
- corresponding
- runtime release
- notes for
- current
- information and
- level of
- support)*
- =========== =============== ============ ===== ================= =========== =============== ======================
+ =========== =============== ============ ===== ================= =============== =============== ======================
+ Processor Alternative Target dGPU/ Target Target OS Support Example
+ Processor Triple APU Features Properties *(see* Products
+ Architecture Supported `amdgpu-os`_
+ *and
+ corresponding
+ runtime release
+ notes for
+ current
+ information and
+ level of
+ support)*
+ =========== =============== ============ ===== ================= =============== =============== ======================
**Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_
- -------------------------------------------------------------------------------------------------------------------
+ -----------------------------------------------------------------------------------------------------------------------
``r600`` ``r600`` dGPU - Does not
support
generic
@@ -143,7 +143,7 @@ specific information.
address
space
**Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_
- -------------------------------------------------------------------------------------------------------------------
+ -----------------------------------------------------------------------------------------------------------------------
``rv710`` ``r600`` dGPU - Does not
support
generic
@@ -160,7 +160,7 @@ specific information.
address
space
**Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_
- -------------------------------------------------------------------------------------------------------------------
+ -----------------------------------------------------------------------------------------------------------------------
``cedar`` ``r600`` dGPU - Does not
support
generic
@@ -187,7 +187,7 @@ specific information.
address
space
**Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_
- -------------------------------------------------------------------------------------------------------------------
+ -----------------------------------------------------------------------------------------------------------------------
``barts`` ``r600`` dGPU - Does not
support
generic
@@ -209,208 +209,208 @@ specific information.
address
space
**GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_
- -------------------------------------------------------------------------------------------------------------------
- ``gfx600`` - ``tahiti`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
+ -----------------------------------------------------------------------------------------------------------------------
+ ``gfx600`` - ``tahiti`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
support
generic
address
space
- ``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
+ ``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
- ``verde`` support
generic
address
space
- ``gfx602`` - ``hainan`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
+ ``gfx602`` - ``hainan`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
- ``oland`` support
generic
address
space
**GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_
- -------------------------------------------------------------------------------------------------------------------
- ``gfx700`` - ``kaveri`` ``amdgcn`` APU - *rocm-amdhsa* - A6-7000
- - *pal-amdhsa* - A6 Pro-7050B
- - *pal-amdpal* - A8-7100
- - A8 Pro-7150B
- - A10-7300
- - A10 Pro-7350B
- - FX-7500
- - A8-7200P
- - A10-7400P
- - FX-7600P
- ``gfx701`` - ``hawaii`` ``amdgcn`` dGPU - *rocm-amdhsa* - FirePro W8100
- - *pal-amdhsa* - FirePro W9100
- - *pal-amdpal* - FirePro S9150
- - FirePro S9170
- ``gfx702`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 290
- - *pal-amdhsa* - Radeon R9 290x
- - *pal-amdpal* - Radeon R390
- - Radeon R390x
- ``gfx703`` - ``kabini`` ``amdgcn`` APU - *pal-amdhsa* - E1-2100
- - ``mullins`` - *pal-amdpal* - E1-2200
- - E1-2500
- - E2-3000
- - E2-3800
- - A4-5000
- - A4-5100
- - A6-5200
- - A4 Pro-3340B
- ``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - *pal-amdhsa* - Radeon HD 7790
- - *pal-amdpal* - Radeon HD 8770
- - R7 260
- - R7 260X
- ``gfx705`` ``amdgcn`` APU - *pal-amdhsa* *TBA*
- - *pal-amdpal*
- .. TODO::
-
- Add product
- names.
+ -----------------------------------------------------------------------------------------------------------------------
+ ``gfx700`` - ``kaveri`` ``amdgcn`` APU - Offset - *rocm-amdhsa* - A6-7000
+ flat - *pal-amdhsa* - A6 Pro-7050B
+ scratch - *pal-amdpal* - A8-7100
+ - A8 Pro-7150B
+ - A10-7300
+ - A10 Pro-7350B
+ - FX-7500
+ - A8-7200P
+ - A10-7400P
+ - FX-7600P
+ ``gfx701`` - ``hawaii`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - FirePro W8100
+ flat - *pal-amdhsa* - FirePro W9100
+ scratch - *pal-amdpal* - FirePro S9150
+ - FirePro S9170
+ ``gfx702`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon R9 290
+ flat - *pal-amdhsa* - Radeon R9 290x
+ scratch - *pal-amdpal* - Radeon R390
+ - Radeon R390x
+ ``gfx703`` - ``kabini`` ``amdgcn`` APU - Offset - *pal-amdhsa* - E1-2100
+ - ``mullins`` flat - *pal-amdpal* - E1-2200
+ scratch - E1-2500
+ - E2-3000
+ - E2-3800
+ - A4-5000
+ - A4-5100
+ - A6-5200
+ - A4 Pro-3340B
+ ``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - Offset - *pal-amdhsa* - Radeon HD 7790
+ flat - *pal-amdpal* - Radeon HD 8770
+ scratch - R7 260
+ - R7 260X
+ ``gfx705`` ``amdgcn`` APU - Offset - *pal-amdhsa* *TBA*
+ flat - *pal-amdpal*
+ scratch .. TODO::
+
+ Add product
+ names.
**GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_
- -------------------------------------------------------------------------------------------------------------------
- ``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - *rocm-amdhsa* - A6-8500P
- - *pal-amdhsa* - Pro A6-8500B
- - *pal-amdpal* - A8-8600P
- - Pro A8-8600B
- - FX-8800P
- - Pro A12-8800B
- - A10-8700P
- - Pro A10-8700B
- - A10-8780P
- - A10-9600P
- - A10-9630P
- - A12-9700P
- - A12-9730P
- - FX-9800P
- - FX-9830P
- - E2-9010
- - A6-9210
- - A9-9410
- ``gfx802`` - ``iceland`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 285
- - ``tonga`` - *pal-amdhsa* - Radeon R9 380
- - *pal-amdpal* - Radeon R9 385
- ``gfx803`` - ``fiji`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 Nano
- - *pal-amdhsa* - Radeon R9 Fury
- - *pal-amdpal* - Radeon R9 FuryX
- - Radeon Pro Duo
- - FirePro S9300x2
- - Radeon Instinct MI8
- \ - ``polaris10`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon RX 470
- - *pal-amdhsa* - Radeon RX 480
- - *pal-amdpal* - Radeon Instinct MI6
- \ - ``polaris11`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon RX 460
- - *pal-amdhsa*
- - *pal-amdpal*
- ``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - *rocm-amdhsa* - FirePro S7150
- - *pal-amdhsa* - FirePro S7100
- - *pal-amdpal* - FirePro W7100
- - Mobile FirePro
- M7170
- ``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack - *rocm-amdhsa* *TBA*
- - *pal-amdhsa*
- - *pal-amdpal* .. TODO::
-
- Add product
- names.
+ -----------------------------------------------------------------------------------------------------------------------
+ ``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - Offset - *rocm-amdhsa* - A6-8500P
+ flat - *pal-amdhsa* - Pro A6-8500B
+ scratch - *pal-amdpal* - A8-8600P
+ - Pro A8-8600B
+ - FX-8800P
+ - Pro A12-8800B
+ - A10-8700P
+ - Pro A10-8700B
+ - A10-8780P
+ - A10-9600P
+ - A10-9630P
+ - A12-9700P
+ - A12-9730P
+ - FX-9800P
+ - FX-9830P
+ - E2-9010
+ - A6-9210
+ - A9-9410
+ ``gfx802`` - ``iceland`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon R9 285
+ - ``tonga`` flat - *pal-amdhsa* - Radeon R9 380
+ scratch - *pal-amdpal* - Radeon R9 385
+ ``gfx803`` - ``fiji`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 Nano
+ - *pal-amdhsa* - Radeon R9 Fury
+ - *pal-amdpal* - Radeon R9 FuryX
+ - Radeon Pro Duo
+ - FirePro S9300x2
+ - Radeon Instinct MI8
+ \ - ``polaris10`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon RX 470
+ flat - *pal-amdhsa* - Radeon RX 480
+ scratch - *pal-amdpal* - Radeon Instinct MI6
+ \ - ``polaris11`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon RX 460
+ flat - *pal-amdhsa*
+ scratch - *pal-amdpal*
+ ``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - FirePro S7150
+ flat - *pal-amdhsa* - FirePro S7100
+ scratch - *pal-amdpal* - FirePro W7100
+ - Mobile FirePro
+ M7170
+ ``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack - Offset - *rocm-amdhsa* *TBA*
+ flat - *pal-amdhsa*
+ scratch - *pal-amdpal* .. TODO::
+
+ Add product
+ names.
**GCN GFX9 (Vega)** [AMD-GCN-GFX9]_
- -------------------------------------------------------------------------------------------------------------------
- ``gfx900`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* - Radeon Vega
- - *pal-amdhsa* Frontier Edition
- - *pal-amdpal* - Radeon RX Vega 56
- - Radeon RX Vega 64
- - Radeon RX Vega 64
- Liquid
- - Radeon Instinct MI25
- ``gfx902`` ``amdgcn`` APU - xnack - *rocm-amdhsa* - Ryzen 3 2200G
- - *pal-amdhsa* - Ryzen 5 2400G
- - *pal-amdpal*
- ``gfx904`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* *TBA*
- - *pal-amdhsa*
- - *pal-amdpal* .. TODO::
-
- Add product
- names.
-
- ``gfx906`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* - Radeon Instinct MI50
- - xnack - *pal-amdhsa* - Radeon Instinct MI60
- - *pal-amdpal* - Radeon VII
- - Radeon Pro VII
- ``gfx908`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* *TBA*
- - xnack
- .. TODO::
-
- Add product
- names.
-
- ``gfx909`` ``amdgcn`` APU - xnack - *pal-amdpal* *TBA*
-
- .. TODO::
-
- Add product
- names.
-
- ``gfx90c`` ``amdgcn`` APU - xnack - *pal-amdpal* - Ryzen 7 4700G
- - Ryzen 7 4700GE
- - Ryzen 5 4600G
- - Ryzen 5 4600GE
- - Ryzen 3 4300G
- - Ryzen 3 4300GE
- - Ryzen Pro 4000G
- - Ryzen 7 Pro 4700G
- - Ryzen 7 Pro 4750GE
- - Ryzen 5 Pro 4650G
- - Ryzen 5 Pro 4650GE
- - Ryzen 3 Pro 4350G
- - Ryzen 3 Pro 4350GE
+ -----------------------------------------------------------------------------------------------------------------------
+ ``gfx900`` ``amdgcn`` dGPU - xnack - Absolute - *rocm-amdhsa* - Radeon Vega
+ flat - *pal-amdhsa* Frontier Edition
+ scratch - *pal-amdpal* - Radeon RX Vega 56
+ - Radeon RX Vega 64
+ - Radeon RX Vega 64
+ Liquid
+ - Radeon Instinct MI25
+ ``gfx902`` ``amdgcn`` APU - xnack - Absolute - *rocm-amdhsa* - Ryzen 3 2200G
+ flat - *pal-amdhsa* - Ryzen 5 2400G
+ scratch - *pal-amdpal*
+ ``gfx904`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* *TBA*
+ - *pal-amdhsa*
+ - *pal-amdpal* .. TODO::
+
+ Add product
+ names.
+
+ ``gfx906`` ``amdgcn`` dGPU - sramecc - Absolute - *rocm-amdhsa* - Radeon Instinct MI50
+ - xnack flat - *pal-amdhsa* - Radeon Instinct MI60
+ scratch - *pal-amdpal* - Radeon VII
+ - Radeon Pro VII
+ ``gfx908`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* *TBA*
+ - xnack - Absolute
+ flat .. TODO::
+ scratch
+ Add product
+ names.
+
+ ``gfx909`` ``amdgcn`` APU - xnack - Absolute - *pal-amdpal* *TBA*
+ flat
+ scratch .. TODO::
+
+ Add product
+ names.
+
+ ``gfx90c`` ``amdgcn`` APU - xnack - Absolute - *pal-amdpal* - Ryzen 7 4700G
+ flat - Ryzen 7 4700GE
+ scratch - Ryzen 5 4600G
+ - Ryzen 5 4600GE
+ - Ryzen 3 4300G
+ - Ryzen 3 4300GE
+ - Ryzen Pro 4000G
+ - Ryzen 7 Pro 4700G
+ - Ryzen 7 Pro 4750GE
+ - Ryzen 5 Pro 4650G
+ - Ryzen 5 Pro 4650GE
+ - Ryzen 3 Pro 4350G
+ - Ryzen 3 Pro 4350GE
**GCN GFX10 (RDNA 1)** [AMD-GCN-GFX10-RDNA1]_
- -------------------------------------------------------------------------------------------------------------------
- ``gfx1010`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* - Radeon RX 5700
- - wavefrontsize64 - *pal-amdhsa* - Radeon RX 5700 XT
- - xnack - *pal-amdpal* - Radeon Pro 5600 XT
- - Radeon Pro 5600M
- ``gfx1011`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
- - wavefrontsize64 - *pal-amdhsa*
- - xnack - *pal-amdpal*
- .. TODO::
-
- Add product
- names.
-
- ``gfx1012`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* - Radeon RX 5500
- - wavefrontsize64 - *pal-amdhsa* - Radeon RX 5500 XT
- - xnack - *pal-amdpal*
+ -----------------------------------------------------------------------------------------------------------------------
+ ``gfx1010`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* - Radeon RX 5700
+ - wavefrontsize64 flat - *pal-amdhsa* - Radeon RX 5700 XT
+ - xnack scratch - *pal-amdpal* - Radeon Pro 5600 XT
+ - Radeon Pro 5600M
+ ``gfx1011`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
+ - wavefrontsize64 - Absolute - *pal-amdhsa*
+ - xnack flat - *pal-amdpal*
+ scratch .. TODO::
+
+ Add product
+ names.
+
+ ``gfx1012`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* - Radeon RX 5500
+ - wavefrontsize64 flat - *pal-amdhsa* - Radeon RX 5500 XT
+ - xnack scratch - *pal-amdpal*
**GCN GFX10 (RDNA 2)** [AMD-GCN-GFX10-RDNA2]_
- -------------------------------------------------------------------------------------------------------------------
- ``gfx1030`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
- - wavefrontsize64 - *pal-amdhsa*
- - *pal-amdpal* .. TODO::
+ -----------------------------------------------------------------------------------------------------------------------
+ ``gfx1030`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA*
+ - wavefrontsize64 flat - *pal-amdhsa*
+ scratch - *pal-amdpal* .. TODO::
- Add product
- names.
+ Add product
+ names.
- ``gfx1031`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
- - wavefrontsize64 - *pal-amdhsa*
- - *pal-amdpal* .. TODO::
+ ``gfx1031`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA*
+ - wavefrontsize64 flat - *pal-amdhsa*
+ scratch - *pal-amdpal* .. TODO::
- Add product
- names.
+ Add product
+ names.
- ``gfx1032`` ``amdgcn`` dGPU - cumode - *pal-amdhsa* *TBA*
- - wavefrontsize64 - *pal-amdhsa*
- - *pal-amdpal* .. TODO::
+ ``gfx1032`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA*
+ - wavefrontsize64 flat - *pal-amdhsa*
+ scratch - *pal-amdpal* .. TODO::
- Add product
- names.
+ Add product
+ names.
- ``gfx1033`` ``amdgcn`` APU - cumode - *pal-amdpal* *TBA*
- - wavefrontsize64
- .. TODO::
+ ``gfx1033`` ``amdgcn`` APU - cumode - Absolute - *pal-amdpal* *TBA*
+ - wavefrontsize64 flat
+ scratch .. TODO::
- Add product
- names.
+ Add product
+ names.
- =========== =============== ============ ===== ================= =========== =============== ======================
+ =========== =============== ============ ===== ================= =============== =============== ======================
.. _amdgpu-target-features:
@@ -4162,18 +4162,9 @@ SGPR register initial state is defined in
(kernel descriptor enable of
field) SGPRs
========== ========================== ====== ==============================
- First Private Segment Buffer 4 This is 4 SGPRs:
- (enable_sgpr_private
+ First Private Segment Buffer 4 See
+ (enable_sgpr_private :ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`.
_segment_buffer)
- V# that can be used,
- together with Scratch
- Wavefront Offset as an
- offset, to access the
- private memory space using a
- segment address.
-
- CP uses the value provided
- by the runtime.
then Dispatch Ptr 2 64-bit address of AQL dispatch
(enable_sgpr_dispatch_ptr) packet for kernel dispatch
actually executing.
@@ -4193,87 +4184,8 @@ SGPR register initial state is defined in
then Dispatch Id 2 64-bit Dispatch ID of the
(enable_sgpr_dispatch_id) dispatch packet being
executed.
- then Flat Scratch Init 2 This is 2 SGPRs:
- (enable_sgpr_flat_scratch
- _init) GFX6
- Not supported.
- GFX7-GFX8
- The first SGPR is a 32-bit
- byte offset from
- ``SH_HIDDEN_PRIVATE_BASE_VIMID``
- to per SPI base of memory
- for scratch for the queue
- executing the kernel
- dispatch. CP obtains this
- from the runtime. (The
- Scratch Segment Buffer base
- address is
- ``SH_HIDDEN_PRIVATE_BASE_VIMID``
- plus this offset.) The value
- of Scratch Wavefront Offset must
- be added to this offset by
- the kernel machine code,
- right shifted by 8, and
- moved to the FLAT_SCRATCH_HI
- SGPR register.
- FLAT_SCRATCH_HI corresponds
- to SGPRn-4 on GFX7, and
- SGPRn-6 on GFX8 (where SGPRn
- is the highest numbered SGPR
- allocated to the wavefront).
- FLAT_SCRATCH_HI is
- multiplied by 256 (as it is
- in units of 256 bytes) and
- added to
- ``SH_HIDDEN_PRIVATE_BASE_VIMID``
- to calculate the per wavefront
- FLAT SCRATCH BASE in flat
- memory instructions that
- access the scratch
- aperture.
-
- The second SGPR is 32-bit
- byte size of a single
- work-item's scratch memory
- usage. CP obtains this from
- the runtime, and it is
- always a multiple of DWORD.
- CP checks that the value in
- the kernel dispatch packet
- Private Segment Byte Size is
- not larger and requests the
- runtime to increase the
- queue's scratch size if
- necessary. The kernel code
- must move it to
- FLAT_SCRATCH_LO which is
- SGPRn-3 on GFX7 and SGPRn-5
- on GFX8. FLAT_SCRATCH_LO is
- used as the FLAT SCRATCH
- SIZE in flat memory
- instructions. Having CP load
- it once avoids loading it at
- the beginning of every
- wavefront.
- GFX9-GFX10
- This is the
- 64-bit base address of the
- per SPI scratch backing
- memory managed by SPI for
- the queue executing the
- kernel dispatch. CP obtains
- this from the runtime (and
- divides it if there are
- multiple Shader Arrays each
- with its own SPI). The value
- of Scratch Wavefront Offset must
- be added by the kernel
- machine code and the result
- moved to the FLAT_SCRATCH
- SGPR which is SGPRn-6 and
- SGPRn-5. It is used as the
- FLAT SCRATCH BASE in flat
- memory instructions.
+ then Flat Scratch Init 2 See
+ :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
then Private Segment Size 1 The 32-bit byte size of a
(enable_sgpr_private single
work-item's
@@ -4338,19 +4250,10 @@ SGPR register initial state is defined in
then Work-Group Info 1 {first_wavefront, 14'b0000,
(enable_sgpr_workgroup ordered_append_term[10:0],
_info) threadgroup_size_in_wavefronts[5:0]}
- then Scratch Wavefront Offset 1 This is 1 SGPR:
- (enable_sgpr_private
- _segment_wavefront_offset)
- 32-bit byte offset from base
- of scratch base of queue
- executing the kernel
- dispatch. Must be used as an
- offset with Private segment
- address when using Scratch
- Segment Buffer. It must be
- used to set up FLAT SCRATCH
- for flat addressing (see
- :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`).
+ then Scratch Wavefront Offset 1 See
+ (enable_sgpr_private :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
+ _segment_wavefront_offset) and
+ :ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`.
========== ========================== ====== ==============================
The order of the VGPR registers is defined, but the compiler can specify which
@@ -4390,12 +4293,11 @@ The setting of registers is done by GPU CP/ADC/SPI hardware as follows:
combination including none.
3. Scratch Wavefront Offset is set by SPI in a per wavefront basis which is why
its value cannot be included with the flat scratch init value which is per
- queue.
+ queue (see :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`).
4. The VGPRs are set by SPI which only supports specifying either (X), (X, Y)
or (X, Y, Z).
-
-See :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch` for Flat Scratch register
-pair initialization.
+5. Flat Scratch register pair initialization is described in
+ :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
The global segment can be accessed either using buffer instructions (GFX6 which
has V# 64-bit address support), flat instructions (GFX7-GFX10), or global
@@ -4474,48 +4376,98 @@ pointer are replaced with immediate ``0`` offsets.
Flat Scratch
++++++++++++
-GFX6
- Flat scratch is not supported.
+There are
diff erent methods used for initializing flat scratch:
+
+* If the *Target Properties* column of :ref:`amdgpu-processor-table`
+ specifies *Does not support generic address space*:
+
+ Flat scratch is not supported and there is no flat scratch register pair.
+
+* If the *Target Properties* column of :ref:`amdgpu-processor-table`
+ specifies *Offset flat scratch*:
+
+ If the kernel or any function it calls may use flat operations to access
+ scratch memory, the prolog code must set up the FLAT_SCRATCH register pair
+ (FLAT_SCRATCH_LO/FLAT_SCRATCH_HI). Initialization uses Flat Scratch Init and
+ Scratch Wavefront Offset SGPR registers (see
+ :ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
+
+ 1. The low word of Flat Scratch Init is the 32-bit byte offset from
+ ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
+ being managed by SPI for the queue executing the kernel dispatch. This is
+ the same value used in the Scratch Segment Buffer V# base address.
+
+ CP obtains this from the runtime. (The Scratch Segment Buffer base address
+ is ``SH_HIDDEN_PRIVATE_BASE_VIMID`` plus this offset.)
+
+ The prolog must add the value of Scratch Wavefront Offset to get the
+ wavefront's byte scratch backing memory offset from
+ ``SH_HIDDEN_PRIVATE_BASE_VIMID``.
+
+ The Scratch Wavefront Offset must also be used as an offset with Private
+ segment address when using the Scratch Segment Buffer.
+
+ Since FLAT_SCRATCH_LO is in units of 256 bytes, the offset must be right
+ shifted by 8 before moving into FLAT_SCRATCH_HI.
+
+ FLAT_SCRATCH_HI corresponds to SGPRn-4 on GFX7, and SGPRn-6 on GFX8 (where
+ SGPRn is the highest numbered SGPR allocated to the wavefront).
+ FLAT_SCRATCH_HI is multiplied by 256 (as it is in units of 256 bytes) and
+ added to ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to calculate the per wavefront
+ FLAT SCRATCH BASE in flat memory instructions that access the scratch
+ aperture.
+ 2. The second word of Flat Scratch Init is 32-bit byte size of a single
+ work-items scratch memory usage.
+
+ CP obtains this from the runtime, and it is always a multiple of DWORD. CP
+ checks that the value in the kernel dispatch packet Private Segment Byte
+ Size is not larger and requests the runtime to increase the queue's scratch
+ size if necessary.
+
+ CP directly loads from the kernel dispatch packet Private Segment Byte Size
+ field and rounds up to a multiple of DWORD. Having CP load it once avoids
+ loading it at the beginning of every wavefront.
+
+ The kernel prolog code must move it to FLAT_SCRATCH_LO which is SGPRn-3 on
+ GFX7 and SGPRn-5 on GFX8. FLAT_SCRATCH_LO is used as the FLAT SCRATCH SIZE
+ in flat memory instructions.
+
+* If the *Target Properties* column of :ref:`amdgpu-processor-table`
+ specifies *Absolute flat scratch*:
-GFX7-GFX10
If the kernel or any function it calls may use flat operations to access
scratch memory, the prolog code must set up the FLAT_SCRATCH register pair
(FLAT_SCRATCH_LO/FLAT_SCRATCH_HI which are in SGPRn-4/SGPRn-3). Initialization
uses Flat Scratch Init and Scratch Wavefront Offset SGPR registers (see
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
- GFX7-GFX8
-
- 1. The low word of Flat Scratch Init is 32-bit byte offset from
- ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
- being managed by SPI for the queue executing the kernel dispatch. This is
- the same value used in the Scratch Segment Buffer V# base address. The
- prolog must add the value of Scratch Wavefront Offset to get the
- wavefront's byte scratch backing memory offset from
- ``SH_HIDDEN_PRIVATE_BASE_VIMID``. Since FLAT_SCRATCH_LO is in units of 256
- bytes, the offset must be right shifted by 8 before moving into
- FLAT_SCRATCH_LO.
- 2. The second word of Flat Scratch Init is 32-bit byte size of a single
- work-items scratch memory usage. This is directly loaded from the kernel
- dispatch packet Private Segment Byte Size and rounded up to a multiple of
- DWORD. Having CP load it once avoids loading it at the beginning of every
- wavefront. The prolog must move it to FLAT_SCRATCH_LO for use as FLAT
- SCRATCH SIZE.
-
- GFX9-GFX10
- The Flat Scratch Init is the 64-bit address of the base of scratch backing
- memory being managed by SPI for the queue executing the kernel dispatch. The
- prolog must add the value of the wave's Scratch Wavefront Offset and moved
- as a 64-bit value to the FLAT_SCRATCH pair for use as the flat scratch base
- in flat memory instructions.
+ The Flat Scratch Init is the 64-bit address of the base of scratch backing
+ memory being managed by SPI for the queue executing the kernel dispatch.
+
+ CP obtains this from the runtime.
+
+ The kernel prolog must add the value of the wave's Scratch Wavefront Offset
+ and move the result as a 64-bit value to the FLAT_SCRATCH SGPR register pair
+ which is SGPRn-6 and SGPRn-5. It is used as the FLAT SCRATCH BASE in flat
+ memory instructions.
+
+ The Scratch Wavefront Offset must also be used as an offset with Private
+ segment address when using the Scratch Segment Buffer (see
+ :ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`).
.. _amdgpu-amdhsa-kernel-prolog-private-segment-buffer:
Private Segment Buffer
++++++++++++++++++++++
-A set of four SGPRs beginning at a four-aligned SGPR index are always selected
-to serve as the scratch V# for the kernel as follows:
+Private Segment Buffer SGPR register is used to initilize 4 SGPRs
+that are used as a V# to access scratch. CP uses the value provided by the
+runtime. It is used, together with Scratch Wavefront Offset as an offset, to
+access the private memory space using a segment address. See
+:ref:`amdgpu-amdhsa-initial-kernel-execution-state`.
+
+The scratch V# is a four-aligned SGPR and always selected for the kernel as
+follows:
- If it is known during instruction selection that there is stack usage,
SGPR0-3 is reserved for use as the scratch V#. Stack usage is assumed if
More information about the llvm-branch-commits
mailing list