[llvm] 1f38d38 - [AMDGPU] Fix documentation table formatting from #118750 (NFC)
Carl Ritson via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 29 21:29:46 PST 2025
Author: Carl Ritson
Date: 2025-01-30T14:27:25+09:00
New Revision: 1f38d38d544b090fd7b9b63454d8310eff0bb7d9
URL: https://github.com/llvm/llvm-project/commit/1f38d38d544b090fd7b9b63454d8310eff0bb7d9
DIFF: https://github.com/llvm/llvm-project/commit/1f38d38d544b090fd7b9b63454d8310eff0bb7d9.diff
LOG: [AMDGPU] Fix documentation table formatting from #118750 (NFC)
Added:
Modified:
llvm/docs/AMDGPUUsage.rst
Removed:
################################################################################
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 368a469d00e370..b646621d12eb0d 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1549,181 +1549,180 @@ The AMDGPU backend supports the following LLVM IR attributes.
.. table:: AMDGPU LLVM IR Attributes
:name: amdgpu-llvm-ir-attributes-table
- ======================================= ==========================================================
- LLVM Attribute Description
- ======================================= ==========================================================
- "amdgpu-flat-work-group-size"="min,max" Specify the minimum and maximum flat work group sizes that
- will be specified when the kernel is dispatched. Generated
- by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.
- The IR implied default value is 1,1024. Clang may emit this attribute
- with more restrictive bounds depending on language defaults.
- If the actual block or workgroup size exceeds the limit at any point during
- the execution, the behavior is undefined. For example, even if there is
- only one active thread but the thread local id exceeds the limit, the
- behavior is undefined.
-
- "amdgpu-implicitarg-num-bytes"="n" Number of kernel argument bytes to add to the kernel
- argument block size for the implicit arguments. This
- varies by OS and language (for OpenCL see
- :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
- "amdgpu-num-sgpr"="n" Specifies the number of SGPRs to use. Generated by
- the ``amdgpu_num_sgpr`` CLANG attribute [CLANG-ATTR]_.
- "amdgpu-num-vgpr"="n" Specifies the number of VGPRs to use. Generated by the
- ``amdgpu_num_vgpr`` CLANG attribute [CLANG-ATTR]_.
- "amdgpu-waves-per-eu"="m,n" Specify the minimum and maximum number of waves per
- execution unit. Generated by the ``amdgpu_waves_per_eu``
- CLANG attribute [CLANG-ATTR]_. This is an optimization hint,
- and the backend may not be able to satisfy the request. If
- the specified range is incompatible with the function's
- "amdgpu-flat-work-group-size" value, the implied occupancy
- bounds by the workgroup size takes precedence.
-
- "amdgpu-ieee" true/false. GFX6-GFX11 Only
- Specify whether the function expects the IEEE field of the
- mode register to be set on entry. Overrides the default for
- the calling convention.
- "amdgpu-dx10-clamp" true/false. GFX6-GFX11 Only
- Specify whether the function expects the DX10_CLAMP field of
- the mode register to be set on entry. Overrides the default
- for the calling convention.
-
- "amdgpu-no-workitem-id-x" Indicates the function does not depend on the value of the
- llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
- attribute, or reached through a call site marked with this attribute, and
- that intrinsic is called, the behavior of the program is undefined. (Whole-program
- undefined behavior is used here because, for example, the absence of a required workitem
- ID in the preloaded register set can mean that all other preloaded registers
- are earlier than the compilation assumed they would be.) The backend can
- generally infer this during code generation, so typically there is no
- benefit to frontends marking functions with this.
-
- "amdgpu-no-workitem-id-y" The same as amdgpu-no-workitem-id-x, except for the
- llvm.amdgcn.workitem.id.y intrinsic.
-
- "amdgpu-no-workitem-id-z" The same as amdgpu-no-workitem-id-x, except for the
- llvm.amdgcn.workitem.id.z intrinsic.
-
- "amdgpu-no-workgroup-id-x" The same as amdgpu-no-workitem-id-x, except for the
- llvm.amdgcn.workgroup.id.x intrinsic.
-
- "amdgpu-no-workgroup-id-y" The same as amdgpu-no-workitem-id-x, except for the
- llvm.amdgcn.workgroup.id.y intrinsic.
-
- "amdgpu-no-workgroup-id-z" The same as amdgpu-no-workitem-id-x, except for the
- llvm.amdgcn.workgroup.id.z intrinsic.
-
- "amdgpu-no-dispatch-ptr" The same as amdgpu-no-workitem-id-x, except for the
- llvm.amdgcn.dispatch.ptr intrinsic.
-
- "amdgpu-no-implicitarg-ptr" The same as amdgpu-no-workitem-id-x, except for the
- llvm.amdgcn.implicitarg.ptr intrinsic.
-
- "amdgpu-no-dispatch-id" The same as amdgpu-no-workitem-id-x, except for the
- llvm.amdgcn.dispatch.id intrinsic.
-
- "amdgpu-no-queue-ptr" Similar to amdgpu-no-workitem-id-x, except for the
- llvm.amdgcn.queue.ptr intrinsic. Note that unlike the other ABI hint
- attributes, the queue pointer may be required in situations where the
- intrinsic call does not directly appear in the program. Some subtargets
- require the queue pointer for to handle some addrspacecasts, as well
- as the llvm.amdgcn.is.shared, llvm.amdgcn.is.private, llvm.trap, and
- llvm.debug intrinsics.
-
- "amdgpu-no-hostcall-ptr" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
- kernel argument that holds the pointer to the hostcall buffer. If this
- attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
- "amdgpu-no-heap-ptr" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
- kernel argument that holds the pointer to an initialized memory buffer
- that conforms to the requirements of the malloc/free device library V1
- version implementation. If this attribute is absent, then the
- amdgpu-no-implicitarg-ptr is also removed.
-
- "amdgpu-no-multigrid-sync-arg" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
- kernel argument that holds the multigrid synchronization pointer. If this
- attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
- "amdgpu-no-default-queue" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
- kernel argument that holds the default queue pointer. If this
- attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
- "amdgpu-no-completion-action" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
- kernel argument that holds the completion action pointer. If this
- attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
- "amdgpu-lds-size"="min[,max]" Min is the minimum number of bytes that will be allocated in the Local
- Data Store at address zero. Variables are allocated within this frame
- using absolute symbol metadata, primarily by the AMDGPULowerModuleLDS
- pass. Optional max is the maximum number of bytes that will be allocated.
- Note that min==max indicates that no further variables can be added to
- the frame. This is an internal detail of how LDS variables are lowered,
- language front ends should not set this attribute.
-
- "amdgpu-gds-size" Bytes expected to be allocated at the start of GDS memory at entry.
-
- "amdgpu-git-ptr-high" The hard-wired high half of the address of the global information table
- for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since
- current hardware only allows a 16 bit value.
-
- "amdgpu-32bit-address-high-bits" Assumed high 32-bits for 32-bit address spaces which are really truncated
- 64-bit addresses (i.e., addrspace(6))
-
- "amdgpu-color-export" Indicates shader exports color information if set to 1.
- Defaults to 1 for :ref:`amdgpu_ps <amdgpu-cc>`, and 0 for other calling
- conventions. Determines the necessity and type of null exports when a shader
- terminates early by killing lanes.
-
- "amdgpu-depth-export" Indicates shader exports depth information if set to 1. Determines the
- necessity and type of null exports when a shader terminates early by killing
- lanes. A depth-only shader will export to depth channel when no null export
- target is available (GFX11+).
-
- "InitialPSInputAddr" Set the initial value of the `spi_ps_input_addr` register for
- :ref:`amdgpu_ps <amdgpu-cc>` shaders. Any bits enabled by this value will
- be enabled in the final register value.
-
- "amdgpu-wave-priority-threshold" VALU instruction count threshold for adjusting wave priority. If exceeded,
- temporarily raise the wave priority at the start of the shader function
- until its last VMEM instructions to allow younger waves to issue their VMEM
- instructions as well.
+ ============================================ ==========================================================
+ LLVM Attribute Description
+ ============================================ ==========================================================
+ "amdgpu-flat-work-group-size"="min,max" Specify the minimum and maximum flat work group sizes that
+ will be specified when the kernel is dispatched. Generated
+ by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.
+ The IR implied default value is 1,1024. Clang may emit this attribute
+ with more restrictive bounds depending on language defaults.
+ If the actual block or workgroup size exceeds the limit at any point during
+ the execution, the behavior is undefined. For example, even if there is
+ only one active thread but the thread local id exceeds the limit, the
+ behavior is undefined.
+
+ "amdgpu-implicitarg-num-bytes"="n" Number of kernel argument bytes to add to the kernel
+ argument block size for the implicit arguments. This
+ varies by OS and language (for OpenCL see
+ :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
+ "amdgpu-num-sgpr"="n" Specifies the number of SGPRs to use. Generated by
+ the ``amdgpu_num_sgpr`` CLANG attribute [CLANG-ATTR]_.
+ "amdgpu-num-vgpr"="n" Specifies the number of VGPRs to use. Generated by the
+ ``amdgpu_num_vgpr`` CLANG attribute [CLANG-ATTR]_.
+ "amdgpu-waves-per-eu"="m,n" Specify the minimum and maximum number of waves per
+ execution unit. Generated by the ``amdgpu_waves_per_eu``
+ CLANG attribute [CLANG-ATTR]_. This is an optimization hint,
+ and the backend may not be able to satisfy the request. If
+ the specified range is incompatible with the function's
+ "amdgpu-flat-work-group-size" value, the implied occupancy
+ bounds by the workgroup size takes precedence.
+
+ "amdgpu-ieee" true/false. GFX6-GFX11 Only
+ Specify whether the function expects the IEEE field of the
+ mode register to be set on entry. Overrides the default for
+ the calling convention.
+ "amdgpu-dx10-clamp" true/false. GFX6-GFX11 Only
+ Specify whether the function expects the DX10_CLAMP field of
+ the mode register to be set on entry. Overrides the default
+ for the calling convention.
+
+ "amdgpu-no-workitem-id-x" Indicates the function does not depend on the value of the
+ llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
+ attribute, or reached through a call site marked with this attribute, and
+ that intrinsic is called, the behavior of the program is undefined. (Whole-program
+ undefined behavior is used here because, for example, the absence of a required workitem
+ ID in the preloaded register set can mean that all other preloaded registers
+ are earlier than the compilation assumed they would be.) The backend can
+ generally infer this during code generation, so typically there is no
+ benefit to frontends marking functions with this.
+
+ "amdgpu-no-workitem-id-y" The same as amdgpu-no-workitem-id-x, except for the
+ llvm.amdgcn.workitem.id.y intrinsic.
+
+ "amdgpu-no-workitem-id-z" The same as amdgpu-no-workitem-id-x, except for the
+ llvm.amdgcn.workitem.id.z intrinsic.
+
+ "amdgpu-no-workgroup-id-x" The same as amdgpu-no-workitem-id-x, except for the
+ llvm.amdgcn.workgroup.id.x intrinsic.
+
+ "amdgpu-no-workgroup-id-y" The same as amdgpu-no-workitem-id-x, except for the
+ llvm.amdgcn.workgroup.id.y intrinsic.
+
+ "amdgpu-no-workgroup-id-z" The same as amdgpu-no-workitem-id-x, except for the
+ llvm.amdgcn.workgroup.id.z intrinsic.
+
+ "amdgpu-no-dispatch-ptr" The same as amdgpu-no-workitem-id-x, except for the
+ llvm.amdgcn.dispatch.ptr intrinsic.
+
+ "amdgpu-no-implicitarg-ptr" The same as amdgpu-no-workitem-id-x, except for the
+ llvm.amdgcn.implicitarg.ptr intrinsic.
+
+ "amdgpu-no-dispatch-id" The same as amdgpu-no-workitem-id-x, except for the
+ llvm.amdgcn.dispatch.id intrinsic.
+
+ "amdgpu-no-queue-ptr" Similar to amdgpu-no-workitem-id-x, except for the
+ llvm.amdgcn.queue.ptr intrinsic. Note that unlike the other ABI hint
+ attributes, the queue pointer may be required in situations where the
+ intrinsic call does not directly appear in the program. Some subtargets
+ require the queue pointer for to handle some addrspacecasts, as well
+ as the llvm.amdgcn.is.shared, llvm.amdgcn.is.private, llvm.trap, and
+ llvm.debug intrinsics.
+
+ "amdgpu-no-hostcall-ptr" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+ kernel argument that holds the pointer to the hostcall buffer. If this
+ attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
+
+ "amdgpu-no-heap-ptr" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+ kernel argument that holds the pointer to an initialized memory buffer
+ that conforms to the requirements of the malloc/free device library V1
+ version implementation. If this attribute is absent, then the
+ amdgpu-no-implicitarg-ptr is also removed.
+
+ "amdgpu-no-multigrid-sync-arg" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+ kernel argument that holds the multigrid synchronization pointer. If this
+ attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
+
+ "amdgpu-no-default-queue" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+ kernel argument that holds the default queue pointer. If this
+ attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
+
+ "amdgpu-no-completion-action" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+ kernel argument that holds the completion action pointer. If this
+ attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
+
+ "amdgpu-lds-size"="min[,max]" Min is the minimum number of bytes that will be allocated in the Local
+ Data Store at address zero. Variables are allocated within this frame
+ using absolute symbol metadata, primarily by the AMDGPULowerModuleLDS
+ pass. Optional max is the maximum number of bytes that will be allocated.
+ Note that min==max indicates that no further variables can be added to
+ the frame. This is an internal detail of how LDS variables are lowered,
+ language front ends should not set this attribute.
+
+ "amdgpu-gds-size" Bytes expected to be allocated at the start of GDS memory at entry.
+
+ "amdgpu-git-ptr-high" The hard-wired high half of the address of the global information table
+ for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since
+ current hardware only allows a 16 bit value.
+
+ "amdgpu-32bit-address-high-bits" Assumed high 32-bits for 32-bit address spaces which are really truncated
+ 64-bit addresses (i.e., addrspace(6))
+
+ "amdgpu-color-export" Indicates shader exports color information if set to 1.
+ Defaults to 1 for :ref:`amdgpu_ps <amdgpu-cc>`, and 0 for other calling
+ conventions. Determines the necessity and type of null exports when a shader
+ terminates early by killing lanes.
+
+ "amdgpu-depth-export" Indicates shader exports depth information if set to 1. Determines the
+ necessity and type of null exports when a shader terminates early by killing
+ lanes. A depth-only shader will export to depth channel when no null export
+ target is available (GFX11+).
+
+ "InitialPSInputAddr" Set the initial value of the `spi_ps_input_addr` register for
+ :ref:`amdgpu_ps <amdgpu-cc>` shaders. Any bits enabled by this value will
+ be enabled in the final register value.
+
+ "amdgpu-wave-priority-threshold" VALU instruction count threshold for adjusting wave priority. If exceeded,
+ temporarily raise the wave priority at the start of the shader function
+ until its last VMEM instructions to allow younger waves to issue their VMEM
+ instructions as well.
- "amdgpu-memory-bound" Set internally by backend
-
- "amdgpu-wave-limiter" Set internally by backend
+ "amdgpu-memory-bound" Set internally by backend
- "amdgpu-unroll-threshold" Set base cost threshold preference for loop unrolling within this function,
- default is 300. Actual threshold may be varied by per-loop metadata or
- reduced by heuristics.
+ "amdgpu-wave-limiter" Set internally by backend
- "amdgpu-max-num-workgroups"="x,y,z" Specify the maximum number of work groups for the kernel dispatch in the
- X, Y, and Z dimensions. Each number must be >= 1. Generated by the
- ``amdgpu_max_num_work_groups`` CLANG attribute [CLANG-ATTR]_. Clang only
- emits this attribute when all the three numbers are >= 1.
+ "amdgpu-unroll-threshold" Set base cost threshold preference for loop unrolling within this function,
+ default is 300. Actual threshold may be varied by per-loop metadata or
+ reduced by heuristics.
- "amdgpu-no-agpr" Indicates the function will not require allocating AGPRs. This is only
- relevant on subtargets with AGPRs. The behavior is undefined if a
- function which requires AGPRs is reached through any function marked
- with this attribute.
+ "amdgpu-max-num-workgroups"="x,y,z" Specify the maximum number of work groups for the kernel dispatch in the
+ X, Y, and Z dimensions. Each number must be >= 1. Generated by the
+ ``amdgpu_max_num_work_groups`` CLANG attribute [CLANG-ATTR]_. Clang only
+ emits this attribute when all the three numbers are >= 1.
- "amdgpu-hidden-argument" This attribute is used internally by the backend to mark function arguments
- as hidden. Hidden arguments are managed by the compiler and are not part of
- the explicit arguments supplied by the user.
+ "amdgpu-no-agpr" Indicates the function will not require allocating AGPRs. This is only
+ relevant on subtargets with AGPRs. The behavior is undefined if a
+ function which requires AGPRs is reached through any function marked
+ with this attribute.
- "amdgpu-sgpr-hazard-wait" Disabled SGPR hazard wait insertion if set to 0.
- Exists for testing performance impact of SGPR hazard waits only.
+ "amdgpu-hidden-argument" This attribute is used internally by the backend to mark function arguments
+ as hidden. Hidden arguments are managed by the compiler and are not part of
+ the explicit arguments supplied by the user.
- "amdgpu-sgpr-hazard-boundary-cull" Enable insertion of SGPR hazard cull sequences at function call boundaries.
- Cull sequence reduces future hazard waits, but has a performance cost.
+ "amdgpu-sgpr-hazard-wait" Disabled SGPR hazard wait insertion if set to 0.
+ Exists for testing performance impact of SGPR hazard waits only.
- "amdgpu-sgpr-hazard-mem-wait-cull" Enable insertion of SGPR hazard cull sequences before memory waits.
- Cull sequence reduces future hazard waits, but has a performance cost.
- Attempt to amortize cost by overlapping with memory accesses.
-
- "amdgpu-sgpr-hazard-mem-wait-cull-threshold"
- Sets the number of active SGPR hazards that must be present before
- inserting a cull sequence at a memory wait.
-
- ======================================= ==========================================================
+ "amdgpu-sgpr-hazard-boundary-cull" Enable insertion of SGPR hazard cull sequences at function call boundaries.
+ Cull sequence reduces future hazard waits, but has a performance cost.
+
+ "amdgpu-sgpr-hazard-mem-wait-cull" Enable insertion of SGPR hazard cull sequences before memory waits.
+ Cull sequence reduces future hazard waits, but has a performance cost.
+ Attempt to amortize cost by overlapping with memory accesses.
+
+ "amdgpu-sgpr-hazard-mem-wait-cull-threshold" Sets the number of active SGPR hazards that must be present before
+ inserting a cull sequence at a memory wait.
+
+ ============================================ ==========================================================
Calling Conventions
===================
More information about the llvm-commits
mailing list