[llvm] 5815990 - [AMDGPU] Expand IR Attribute table to handle longer names (NFC)

Thu Feb 20 23:36:33 PST 2025

Author: Carl Ritson
Date: 2025-02-21T16:34:54+09:00
New Revision: 581599096e8a1a89ccd3e053a1209c69a9079083

URL: https://github.com/llvm/llvm-project/commit/581599096e8a1a89ccd3e053a1209c69a9079083
DIFF: https://github.com/llvm/llvm-project/commit/581599096e8a1a89ccd3e053a1209c69a9079083.diff

LOG: [AMDGPU] Expand IR Attribute table to handle longer names (NFC)

Added: 
    

Modified: 
    llvm/docs/AMDGPUUsage.rst

Removed: 
    


################################################################################
diff  --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index d580be1eb8cfc..9932074830866 100644

--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1546,180 +1546,181 @@ The AMDGPU backend supports the following LLVM IR attributes.
   .. table:: AMDGPU LLVM IR Attributes
      :name: amdgpu-llvm-ir-attributes-table
 
-     ============================================ ==========================================================
-     LLVM Attribute                               Description
-     ============================================ ==========================================================
-     "amdgpu-flat-work-group-size"="min,max"      Specify the minimum and maximum flat work group sizes that
-                                                  will be specified when the kernel is dispatched. Generated
-                                                  by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.
-                                                  The IR implied default value is 1,1024. Clang may emit this attribute
-                                                  with more restrictive bounds depending on language defaults.
-                                                  If the actual block or workgroup size exceeds the limit at any point during
-                                                  the execution, the behavior is undefined. For example, even if there is
-                                                  only one active thread but the thread local id exceeds the limit, the
-                                                  behavior is undefined.
-
-     "amdgpu-implicitarg-num-bytes"="n"           Number of kernel argument bytes to add to the kernel
-                                                  argument block size for the implicit arguments. This
-                                                  varies by OS and language (for OpenCL see
-                                                  :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
-     "amdgpu-num-sgpr"="n"                        Specifies the number of SGPRs to use. Generated by
-                                                  the ``amdgpu_num_sgpr`` CLANG attribute [CLANG-ATTR]_.
-     "amdgpu-num-vgpr"="n"                        Specifies the number of VGPRs to use. Generated by the
-                                                  ``amdgpu_num_vgpr`` CLANG attribute [CLANG-ATTR]_.
-     "amdgpu-waves-per-eu"="m,n"                  Specify the minimum and maximum number of waves per
-                                                  execution unit. Generated by the ``amdgpu_waves_per_eu``
-                                                  CLANG attribute [CLANG-ATTR]_. This is an optimization hint,
-                                                  and the backend may not be able to satisfy the request. If
-                                                  the specified range is incompatible with the function's
-                                                  "amdgpu-flat-work-group-size" value, the implied occupancy
-                                                  bounds by the workgroup size takes precedence.
-
-     "amdgpu-ieee" true/false.                    GFX6-GFX11 Only
-                                                  Specify whether the function expects the IEEE field of the
-                                                  mode register to be set on entry. Overrides the default for
-                                                  the calling convention.
-     "amdgpu-dx10-clamp" true/false.              GFX6-GFX11 Only
-                                                  Specify whether the function expects the DX10_CLAMP field of
-                                                  the mode register to be set on entry. Overrides the default
-                                                  for the calling convention.
-
-     "amdgpu-no-workitem-id-x"                    Indicates the function does not depend on the value of the
-                                                  llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
-                                                  attribute, or reached through a call site marked with this attribute, and
-                                                  that intrinsic is called, the behavior of the program is undefined. (Whole-program
-                                                  undefined behavior is used here because, for example, the absence of a required workitem
-                                                  ID in the preloaded register set can mean that all other preloaded registers
-                                                  are earlier than the compilation assumed they would be.) The backend can
-                                                  generally infer this during code generation, so typically there is no
-                                                  benefit to frontends marking functions with this.
-
-     "amdgpu-no-workitem-id-y"                    The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workitem.id.y intrinsic.
-
-     "amdgpu-no-workitem-id-z"                    The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workitem.id.z intrinsic.
-
-     "amdgpu-no-workgroup-id-x"                   The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workgroup.id.x intrinsic.
-
-     "amdgpu-no-workgroup-id-y"                   The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workgroup.id.y intrinsic.
-
-     "amdgpu-no-workgroup-id-z"                   The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workgroup.id.z intrinsic.
-
-     "amdgpu-no-dispatch-ptr"                     The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.dispatch.ptr intrinsic.
-
-     "amdgpu-no-implicitarg-ptr"                  The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.implicitarg.ptr intrinsic.
-
-     "amdgpu-no-dispatch-id"                      The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.dispatch.id intrinsic.
-
-     "amdgpu-no-queue-ptr"                        Similar to amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.queue.ptr intrinsic. Note that unlike the other ABI hint
-                                                  attributes, the queue pointer may be required in situations where the
-                                                  intrinsic call does not directly appear in the program. Some subtargets
-                                                  require the queue pointer for to handle some addrspacecasts, as well
-                                                  as the llvm.amdgcn.is.shared, llvm.amdgcn.is.private, llvm.trap, and
-                                                  llvm.debug intrinsics.
-
-     "amdgpu-no-hostcall-ptr"                     Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the pointer to the hostcall buffer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-heap-ptr"                         Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the pointer to an initialized memory buffer
-                                                  that conforms to the requirements of the malloc/free device library V1
-                                                  version implementation. If this attribute is absent, then the
-                                                  amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-multigrid-sync-arg"               Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the multigrid synchronization pointer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-default-queue"                    Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the default queue pointer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-completion-action"                Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the completion action pointer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-lds-size"="min[,max]"                Min is the minimum number of bytes that will be allocated in the Local
-                                                  Data Store at address zero. Variables are allocated within this frame
-                                                  using absolute symbol metadata, primarily by the AMDGPULowerModuleLDS
-                                                  pass. Optional max is the maximum number of bytes that will be allocated.
-                                                  Note that min==max indicates that no further variables can be added to
-                                                  the frame. This is an internal detail of how LDS variables are lowered,
-                                                  language front ends should not set this attribute.
-
-     "amdgpu-gds-size"                            Bytes expected to be allocated at the start of GDS memory at entry.
-
-     "amdgpu-git-ptr-high"                        The hard-wired high half of the address of the global information table
-                                                  for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since
-                                                  current hardware only allows a 16 bit value.
-
-     "amdgpu-32bit-address-high-bits"             Assumed high 32-bits for 32-bit address spaces which are really truncated
-                                                  64-bit addresses (i.e., addrspace(6))
-
-     "amdgpu-color-export"                        Indicates shader exports color information if set to 1.
-                                                  Defaults to 1 for :ref:`amdgpu_ps <amdgpu-cc>`, and 0 for other calling
-                                                  conventions. Determines the necessity and type of null exports when a shader
-                                                  terminates early by killing lanes.
-
-     "amdgpu-depth-export"                        Indicates shader exports depth information if set to 1. Determines the
-                                                  necessity and type of null exports when a shader terminates early by killing
-                                                  lanes. A depth-only shader will export to depth channel when no null export
-                                                  target is available (GFX11+).
-
-     "InitialPSInputAddr"                         Set the initial value of the `spi_ps_input_addr` register for
-                                                  :ref:`amdgpu_ps <amdgpu-cc>` shaders. Any bits enabled by this value will
-                                                  be enabled in the final register value.
-
-     "amdgpu-wave-priority-threshold"             VALU instruction count threshold for adjusting wave priority. If exceeded,
-                                                  temporarily raise the wave priority at the start of the shader function
-                                                  until its last VMEM instructions to allow younger waves to issue their VMEM
-                                                  instructions as well.
+     ================================================ ==========================================================
+     LLVM Attribute                                   Description
+     ================================================ ==========================================================
+     "amdgpu-flat-work-group-size"="min,max"          Specify the minimum and maximum flat work group sizes that
+                                                      will be specified when the kernel is dispatched. Generated
+                                                      by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.
+                                                      The IR implied default value is 1,1024. Clang may emit this attribute
+                                                      with more restrictive bounds depending on language defaults.
+                                                      If the actual block or workgroup size exceeds the limit at any point during
+                                                      the execution, the behavior is undefined. For example, even if there is
+                                                      only one active thread but the thread local id exceeds the limit, the
+                                                      behavior is undefined.
+
+     "amdgpu-implicitarg-num-bytes"="n"               Number of kernel argument bytes to add to the kernel
+                                                      argument block size for the implicit arguments. This
+                                                      varies by OS and language (for OpenCL see
+                                                      :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
+     "amdgpu-num-sgpr"="n"                            Specifies the number of SGPRs to use. Generated by
+                                                      the ``amdgpu_num_sgpr`` CLANG attribute [CLANG-ATTR]_.
+     "amdgpu-num-vgpr"="n"                            Specifies the number of VGPRs to use. Generated by the
+                                                      ``amdgpu_num_vgpr`` CLANG attribute [CLANG-ATTR]_.
+     "amdgpu-waves-per-eu"="m,n"                      Specify the minimum and maximum number of waves per
+                                                      execution unit. Generated by the ``amdgpu_waves_per_eu``
+                                                      CLANG attribute [CLANG-ATTR]_. This is an optimization hint,
+                                                      and the backend may not be able to satisfy the request. If
+                                                      the specified range is incompatible with the function's
+                                                      "amdgpu-flat-work-group-size" value, the implied occupancy
+                                                      bounds by the workgroup size takes precedence.
+
+     "amdgpu-ieee" true/false.                        GFX6-GFX11 Only
+                                                      Specify whether the function expects the IEEE field of the
+                                                      mode register to be set on entry. Overrides the default for
+                                                      the calling convention.
+     "amdgpu-dx10-clamp" true/false.                  GFX6-GFX11 Only
+                                                      Specify whether the function expects the DX10_CLAMP field of
+                                                      the mode register to be set on entry. Overrides the default
+                                                      for the calling convention.
+
+     "amdgpu-no-workitem-id-x"                        Indicates the function does not depend on the value of the
+                                                      llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
+                                                      attribute, or reached through a call site marked with this attribute,
+                                                      and that intrinsic is called, the behavior of the program is undefined.
+                                                      (Whole-program undefined behavior is used here because, for example,
+                                                      the absence of a required workitem ID in the preloaded register set can
+                                                      mean that all other preloaded registers are earlier than the compilation
+                                                      assumed they would be.) The backend can generally infer this during code
+                                                      generation, so typically there is no benefit to frontends marking
+                                                      functions with this.
+
+     "amdgpu-no-workitem-id-y"                        The same as amdgpu-no-workitem-id-x, except for the
+                                                      llvm.amdgcn.workitem.id.y intrinsic.
+
+     "amdgpu-no-workitem-id-z"                        The same as amdgpu-no-workitem-id-x, except for the
+                                                      llvm.amdgcn.workitem.id.z intrinsic.
+
+     "amdgpu-no-workgroup-id-x"                       The same as amdgpu-no-workitem-id-x, except for the
+                                                      llvm.amdgcn.workgroup.id.x intrinsic.
+
+     "amdgpu-no-workgroup-id-y"                       The same as amdgpu-no-workitem-id-x, except for the
+                                                      llvm.amdgcn.workgroup.id.y intrinsic.
+
+     "amdgpu-no-workgroup-id-z"                       The same as amdgpu-no-workitem-id-x, except for the
+                                                      llvm.amdgcn.workgroup.id.z intrinsic.
+
+     "amdgpu-no-dispatch-ptr"                         The same as amdgpu-no-workitem-id-x, except for the
+                                                      llvm.amdgcn.dispatch.ptr intrinsic.
+
+     "amdgpu-no-implicitarg-ptr"                      The same as amdgpu-no-workitem-id-x, except for the
+                                                      llvm.amdgcn.implicitarg.ptr intrinsic.
+
+     "amdgpu-no-dispatch-id"                          The same as amdgpu-no-workitem-id-x, except for the
+                                                      llvm.amdgcn.dispatch.id intrinsic.
+
+     "amdgpu-no-queue-ptr"                            Similar to amdgpu-no-workitem-id-x, except for the
+                                                      llvm.amdgcn.queue.ptr intrinsic. Note that unlike the other ABI hint
+                                                      attributes, the queue pointer may be required in situations where the
+                                                      intrinsic call does not directly appear in the program. Some subtargets
+                                                      require the queue pointer for to handle some addrspacecasts, as well
+                                                      as the llvm.amdgcn.is.shared, llvm.amdgcn.is.private, llvm.trap, and
+                                                      llvm.debug intrinsics.
+
+     "amdgpu-no-hostcall-ptr"                         Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+                                                      kernel argument that holds the pointer to the hostcall buffer. If this
+                                                      attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
+
+     "amdgpu-no-heap-ptr"                             Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+                                                      kernel argument that holds the pointer to an initialized memory buffer
+                                                      that conforms to the requirements of the malloc/free device library V1
+                                                      version implementation. If this attribute is absent, then the
+                                                      amdgpu-no-implicitarg-ptr is also removed.
+
+     "amdgpu-no-multigrid-sync-arg"                   Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+                                                      kernel argument that holds the multigrid synchronization pointer. If this
+                                                      attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
+
+     "amdgpu-no-default-queue"                        Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+                                                      kernel argument that holds the default queue pointer. If this
+                                                      attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
+
+     "amdgpu-no-completion-action"                    Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+                                                      kernel argument that holds the completion action pointer. If this
+                                                      attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
+
+     "amdgpu-lds-size"="min[,max]"                    Min is the minimum number of bytes that will be allocated in the Local
+                                                      Data Store at address zero. Variables are allocated within this frame
+                                                      using absolute symbol metadata, primarily by the AMDGPULowerModuleLDS
+                                                      pass. Optional max is the maximum number of bytes that will be allocated.
+                                                      Note that min==max indicates that no further variables can be added to
+                                                      the frame. This is an internal detail of how LDS variables are lowered,
+                                                      language front ends should not set this attribute.
+
+     "amdgpu-gds-size"                                Bytes expected to be allocated at the start of GDS memory at entry.
+
+     "amdgpu-git-ptr-high"                            The hard-wired high half of the address of the global information table
+                                                      for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since
+                                                      current hardware only allows a 16 bit value.
+
+     "amdgpu-32bit-address-high-bits"                 Assumed high 32-bits for 32-bit address spaces which are really truncated
+                                                      64-bit addresses (i.e., addrspace(6))
+
+     "amdgpu-color-export"                            Indicates shader exports color information if set to 1.
+                                                      Defaults to 1 for :ref:`amdgpu_ps <amdgpu-cc>`, and 0 for other calling
+                                                      conventions. Determines the necessity and type of null exports when a shader
+                                                      terminates early by killing lanes.
+
+     "amdgpu-depth-export"                            Indicates shader exports depth information if set to 1. Determines the
+                                                      necessity and type of null exports when a shader terminates early by killing
+                                                      lanes. A depth-only shader will export to depth channel when no null export
+                                                      target is available (GFX11+).
+
+     "InitialPSInputAddr"                             Set the initial value of the `spi_ps_input_addr` register for
+                                                      :ref:`amdgpu_ps <amdgpu-cc>` shaders. Any bits enabled by this value will
+                                                      be enabled in the final register value.
+
+     "amdgpu-wave-priority-threshold"                 VALU instruction count threshold for adjusting wave priority. If exceeded,
+                                                      temporarily raise the wave priority at the start of the shader function
+                                                      until its last VMEM instructions to allow younger waves to issue their VMEM
+                                                      instructions as well.
+
+     "amdgpu-memory-bound"                            Set internally by backend
 
-     "amdgpu-memory-bound"                        Set internally by backend
+     "amdgpu-wave-limiter"                            Set internally by backend
 
-     "amdgpu-wave-limiter"                        Set internally by backend
+     "amdgpu-unroll-threshold"                        Set base cost threshold preference for loop unrolling within this function,
+                                                      default is 300. Actual threshold may be varied by per-loop metadata or
+                                                      reduced by heuristics.
 
-     "amdgpu-unroll-threshold"                    Set base cost threshold preference for loop unrolling within this function,
-                                                  default is 300. Actual threshold may be varied by per-loop metadata or
-                                                  reduced by heuristics.
+     "amdgpu-max-num-workgroups"="x,y,z"              Specify the maximum number of work groups for the kernel dispatch in the
+                                                      X, Y, and Z dimensions. Each number must be >= 1. Generated by the
+                                                      ``amdgpu_max_num_work_groups`` CLANG attribute [CLANG-ATTR]_. Clang only
+                                                      emits this attribute when all the three numbers are >= 1.
 
-     "amdgpu-max-num-workgroups"="x,y,z"          Specify the maximum number of work groups for the kernel dispatch in the
-                                                  X, Y, and Z dimensions. Each number must be >= 1. Generated by the
-                                                  ``amdgpu_max_num_work_groups`` CLANG attribute [CLANG-ATTR]_. Clang only
-                                                  emits this attribute when all the three numbers are >= 1.
+     "amdgpu-no-agpr"                                 Indicates the function will not require allocating AGPRs. This is only
+                                                      relevant on subtargets with AGPRs. The behavior is undefined if a
+                                                      function which requires AGPRs is reached through any function marked
+                                                      with this attribute.
 
-     "amdgpu-no-agpr"                             Indicates the function will not require allocating AGPRs. This is only
-                                                  relevant on subtargets with AGPRs. The behavior is undefined if a
-                                                  function which requires AGPRs is reached through any function marked
-                                                  with this attribute.
+     "amdgpu-hidden-argument"                         This attribute is used internally by the backend to mark function arguments
+                                                      as hidden. Hidden arguments are managed by the compiler and are not part of
+                                                      the explicit arguments supplied by the user.
 
-     "amdgpu-hidden-argument"                     This attribute is used internally by the backend to mark function arguments
-                                                  as hidden. Hidden arguments are managed by the compiler and are not part of
-                                                  the explicit arguments supplied by the user.
+     "amdgpu-sgpr-hazard-wait"                        Disabled SGPR hazard wait insertion if set to 0.
+                                                      Exists for testing performance impact of SGPR hazard waits only.
 
-     "amdgpu-sgpr-hazard-wait"                    Disabled SGPR hazard wait insertion if set to 0.
-                                                  Exists for testing performance impact of SGPR hazard waits only.
+     "amdgpu-sgpr-hazard-boundary-cull"               Enable insertion of SGPR hazard cull sequences at function call boundaries.
+                                                      Cull sequence reduces future hazard waits, but has a performance cost.
 
-     "amdgpu-sgpr-hazard-boundary-cull"           Enable insertion of SGPR hazard cull sequences at function call boundaries.
-                                                  Cull sequence reduces future hazard waits, but has a performance cost.
-
-     "amdgpu-sgpr-hazard-mem-wait-cull"           Enable insertion of SGPR hazard cull sequences before memory waits.
-                                                  Cull sequence reduces future hazard waits, but has a performance cost.
-                                                  Attempt to amortize cost by overlapping with memory accesses.
-
-     "amdgpu-sgpr-hazard-mem-wait-cull-threshold" Sets the number of active SGPR hazards that must be present before
-                                                  inserting a cull sequence at a memory wait.
-
-     ============================================ ==========================================================
+     "amdgpu-sgpr-hazard-mem-wait-cull"               Enable insertion of SGPR hazard cull sequences before memory waits.
+                                                      Cull sequence reduces future hazard waits, but has a performance cost.
+                                                      Attempt to amortize cost by overlapping with memory accesses.
+
+     "amdgpu-sgpr-hazard-mem-wait-cull-threshold"     Sets the number of active SGPR hazards that must be present before
+                                                      inserting a cull sequence at a memory wait.
+
+     ================================================ ==========================================================
 
 Calling Conventions
 ===================