[llvm] r329188 - Add AMDPAL Code Conventions section to AMD docs
Tim Corringham via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 4 06:02:09 PDT 2018
Author: timcorringham
Date: Wed Apr 4 06:02:09 2018
New Revision: 329188
URL: http://llvm.org/viewvc/llvm-project?rev=329188&view=rev
Log:
Add AMDPAL Code Conventions section to AMD docs
Summary:
This is a first version of the AMDPAL code conventions.
Further updates will undoubtably be required to fully
document AMDPAL.
Subscribers: nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D45246
Modified:
llvm/trunk/docs/AMDGPUUsage.rst
Modified: llvm/trunk/docs/AMDGPUUsage.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/AMDGPUUsage.rst?rev=329188&r1=329187&r2=329188&view=diff
==============================================================================
--- llvm/trunk/docs/AMDGPUUsage.rst (original)
+++ llvm/trunk/docs/AMDGPUUsage.rst Wed Apr 4 06:02:09 2018
@@ -3781,6 +3781,125 @@ the ``s_trap`` instruction with the foll
debugger ``s_trap 0xff`` Reserved for debugger.
=================== =============== =============== =======================
+AMDPAL
+------
+
+This section provides code conventions used when the target triple OS is
+``amdpal`` (see :ref:`amdgpu-target-triples`) for passing runtime parameters
+from the application/runtime to each invocation of a hardware shader. These
+parameters include both generic, application-controlled parameters called
+*user data* as well as system-generated parameters that are a product of the
+draw or dispatch execution.
+
+User Data
+~~~~~~~~~
+
+Each hardware stage has a set of 32-bit *user data registers* which can be
+written from a command buffer and then loaded into SGPRs when waves are launched
+via a subsequent dispatch or draw operation. This is the way most arguments are
+passed from the application/runtime to a hardware shader.
+
+Compute User Data
+~~~~~~~~~~~~~~~~~
+
+Compute shader user data mappings are simpler than graphics shaders, and have a
+fixed mapping.
+
+Note that there are always 10 available *user data entries* in registers -
+entries beyond that limit must be fetched from memory (via the spill table
+pointer) by the shader.
+
+ .. table:: PAL Compute Shader User Data Registers
+ :name: pal-compute-user-data-registers
+
+ ============= ================================
+ User Register Description
+ ============= ================================
+ 0 Global Internal Table (32-bit pointer)
+ 1 Per-Shader Internal Table (32-bit pointer)
+ 2 - 11 Application-Controlled User Data (10 32-bit values)
+ 12 Spill Table (32-bit pointer)
+ 13 - 14 Thread Group Count (64-bit pointer)
+ 15 GDS Range
+ ============= ================================
+
+Graphics User Data
+~~~~~~~~~~~~~~~~~~
+
+Graphics pipelines support a much more flexible user data mapping:
+
+ .. table:: PAL Graphics Shader User Data Registers
+ :name: pal-graphics-user-data-registers
+
+ ============= ================================
+ User Register Description
+ ============= ================================
+ 0 Global Internal Table (32-bit pointer)
+ + Per-Shader Internal Table (32-bit pointer)
+ + 1-15 Application Controlled User Data
+ (1-15 Contiguous 32-bit Values in Registers)
+ + Spill Table (32-bit pointer)
+ + Draw Index (First Stage Only)
+ + Vertex Offset (First Stage Only)
+ + Instance Offset (First Stage Only)
+ ============= ================================
+
+ The placement of the global internal table remains fixed in the first *user
+ data SGPR register*. Otherwise all parameters are optional, and can be mapped
+ to any desired *user data SGPR register*, with the following regstrictions:
+
+ * Draw Index, Vertex Offset, and Instance Offset can only be used by the first
+ activehardware stage in a graphics pipeline (i.e. where the API vertex
+ shader runs).
+
+ * Application-controlled user data must be mapped into a contiguous range of
+ user data registers.
+
+ * The application-controlled user data range supports compaction remapping, so
+ only *entries* that are actually consumed by the shader must be assigned to
+ corresponding *registers*. Note that in order to support an efficient runtime
+ implementation, the remapping must pack *registers* in the same order as
+ *entries*, with unused *entries* removed.
+
+.. _pal_global_internal_table:
+
+Global Internal Table
+~~~~~~~~~~~~~~~~~~~~~
+
+The global internal table is a table of *shader resource descriptors* (SRDs) that
+define how certain engine-wide, runtime-managed resources should be accessed
+from a shader. The majority of these resources have HW-defined formats, and it
+is up to the compiler to write/read data as required by the target hardware.
+
+The following table illustrates the required format:
+
+ .. table:: PAL Global Internal Table
+ :name: pal-git-table
+
+ ============= ================================
+ Offset Description
+ ============= ================================
+ 0-3 Graphics Scratch SRD
+ 4-7 Compute Scratch SRD
+ 8-11 ES/GS Ring Output SRD
+ 12-15 ES/GS Ring Input SRD
+ 16-19 GS/VS Ring Output #0
+ 20-23 GS/VS Ring Output #1
+ 24-27 GS/VS Ring Output #2
+ 28-31 GS/VS Ring Output #3
+ 32-35 GS/VS Ring Input SRD
+ 36-39 Tessellation Factor Buffer SRD
+ 40-43 Off-Chip LDS Buffer SRD
+ 44-47 Off-Chip Param Cache Buffer SRD
+ 48-51 Sample Position Buffer SRD
+ 52 vaRange::ShadowDescriptorTable High Bits
+ ============= ================================
+
+ The pointer to the global internal table passed to the shader as user data
+ is a 32-bit pointer. The top 32 bits should be assumed to be the same as
+ the top 32 bits of the pipeline, so the shader may use the program
+ counter's top 32 bits.
+
Unspecified OS
--------------
More information about the llvm-commits
mailing list