[llvm] [AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (PR #75030)
via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 11 00:25:58 PST 2023
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-amdgpu
Author: Piotr Sobczak (piotrAMD)
<details>
<summary>Changes</summary>
---
Patch is 164.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/75030.diff
26 Files Affected:
- (modified) llvm/docs/AMDGPUUsage.rst (+119-102)
- (modified) llvm/include/llvm/Support/AMDHSAKernelDescriptor.h (+10-2)
- (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+7-6)
- (modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp (+2-2)
- (modified) llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp (+1-1)
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+5-3)
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+3-3)
- (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+26-2)
- (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+11-2)
- (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+9)
- (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+12-6)
- (modified) llvm/lib/Target/AMDGPU/SIDefines.h (+3)
- (modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp (+1-1)
- (modified) llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.cpp (+14-8)
- (modified) llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.h (+3-1)
- (modified) llvm/lib/Target/AMDGPU/SIProgramInfo.cpp (+31-10)
- (modified) llvm/lib/Target/AMDGPU/SIProgramInfo.h (+5-2)
- (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+11-4)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-fmed3-const-combine.ll (+53)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-minmax-const-combine.ll (+107)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fmed3-min-max-const-combine.ll (+121)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.rsq.clamp.ll (+89)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir (+75)
- (modified) llvm/test/CodeGen/AMDGPU/amdpal-msgpack-ieee.ll (+8)
- (modified) llvm/test/CodeGen/AMDGPU/clamp.ll (+637)
- (added) llvm/test/MC/AMDGPU/hsa-gfx12-v4.s (+294)
``````````diff
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7fb3d70bbeffe..c7327623493e2 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1211,10 +1211,12 @@ The AMDGPU backend supports the following LLVM IR attributes.
"amdgpu-flat-work-group-size" value, the implied occupancy
bounds by the workgroup size takes precedence.
- "amdgpu-ieee" true/false. Specify whether the function expects the IEEE field of the
+ "amdgpu-ieee" true/false. GFX6-GFX11 Only
+ Specify whether the function expects the IEEE field of the
mode register to be set on entry. Overrides the default for
the calling convention.
- "amdgpu-dx10-clamp" true/false. Specify whether the function expects the DX10_CLAMP field of
+ "amdgpu-dx10-clamp" true/false. GFX6-GFX11 Only
+ Specify whether the function expects the DX10_CLAMP field of
the mode register to be set on entry. Overrides the default
for the calling convention.
@@ -4390,21 +4392,21 @@ The fields used by CP for code objects before V3 also match those specified in
``COMPUTE_PGM_RSRC3``
configuration
register. See
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx12-table`.
415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC1``
configuration
register. See
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
447:416 4 bytes COMPUTE_PGM_RSRC2 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC2``
configuration
register. See
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
458:448 7 bits *See separate bits below.* Enable the setup of the
SGPR user data registers
(see
@@ -4472,8 +4474,8 @@ The fields used by CP for code objects before V3 also match those specified in
..
- .. table:: compute_pgm_rsrc1 for GFX6-GFX11
- :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table
+ .. table:: compute_pgm_rsrc1 for GFX6-GFX12
+ :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
@@ -4642,17 +4644,27 @@ The fields used by CP for code objects before V3 also match those specified in
CP is responsible for
filling in
``COMPUTE_PGM_RSRC1.PRIV``.
- 21 1 bit ENABLE_DX10_CLAMP Wavefront starts execution
- with DX10 clamp mode
- enabled. Used by the vector
- ALU to force DX10 style
- treatment of NaN's (when
- set, clamp NaN to zero,
- otherwise pass NaN
- through).
+ 21 1 bit ENABLE_DX10_CLAMP GFX9-GFX11
+ Wavefront starts execution
+ with DX10 clamp mode
+ enabled. Used by the vector
+ ALU to force DX10 style
+ treatment of NaN's (when
+ set, clamp NaN to zero,
+ otherwise pass NaN
+ through).
- Used by CP to set up
- ``COMPUTE_PGM_RSRC1.DX10_CLAMP``.
+ Used by CP to set up
+ ``COMPUTE_PGM_RSRC1.DX10_CLAMP``.
+ WG_RR_EN GFX12
+ If 1, wavefronts are scheduled
+ in a round-robin fashion with
+ respect to the other wavefronts
+ of the SIMD. Otherwise, wavefronts
+ are scheduled in oldest age order.
+
+ CP is responsible for filling in
+ ``COMPUTE_PGM_RSRC1.WG_RR_EN``.
22 1 bit DEBUG_MODE Must be 0.
Start executing wavefront
@@ -4661,21 +4673,24 @@ The fields used by CP for code objects before V3 also match those specified in
CP is responsible for
filling in
``COMPUTE_PGM_RSRC1.DEBUG_MODE``.
- 23 1 bit ENABLE_IEEE_MODE Wavefront starts execution
- with IEEE mode
- enabled. Floating point
- opcodes that support
- exception flag gathering
- will quiet and propagate
- signaling-NaN inputs per
- IEEE 754-2008. Min_dx10 and
- max_dx10 become IEEE
- 754-2008 compliant due to
- signaling-NaN propagation
- and quieting.
+ 23 1 bit ENABLE_IEEE_MODE GFX9-GFX11
+ Wavefront starts execution
+ with IEEE mode
+ enabled. Floating point
+ opcodes that support
+ exception flag gathering
+ will quiet and propagate
+ signaling-NaN inputs per
+ IEEE 754-2008. Min_dx10 and
+ max_dx10 become IEEE
+ 754-2008 compliant due to
+ signaling-NaN propagation
+ and quieting.
- Used by CP to set up
- ``COMPUTE_PGM_RSRC1.IEEE_MODE``.
+ Used by CP to set up
+ ``COMPUTE_PGM_RSRC1.IEEE_MODE``.
+ DISABLE_PERF GFX12
+ Reserved. Must be 0.
24 1 bit BULKY Must be 0.
Only one work-group allowed
@@ -4763,8 +4778,8 @@ The fields used by CP for code objects before V3 also match those specified in
..
- .. table:: compute_pgm_rsrc2 for GFX6-GFX11
- :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table
+ .. table:: compute_pgm_rsrc2 for GFX6-GFX12
+ :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
@@ -4957,8 +4972,8 @@ The fields used by CP for code objects before V3 also match those specified in
..
- .. table:: compute_pgm_rsrc3 for GFX10-GFX11
- :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table
+ .. table:: compute_pgm_rsrc3 for GFX10-GFX12
+ :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx12-table
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
@@ -5437,7 +5452,7 @@ There are different methods used for initializing flat scratch:
specifies *Architected flat scratch*:
If ENABLE_PRIVATE_SEGMENT is enabled in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table` then the FLAT_SCRATCH
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table` then the FLAT_SCRATCH
register pair will be initialized to the 64-bit address of the base of scratch
backing memory being managed by SPI for the queue executing the kernel
dispatch plus the value of the wave's Scratch Wavefront Offset for use as the
@@ -11819,7 +11834,7 @@ Wavefronts are executed in native mode with in-order reporting of loads and
sample instructions. In this mode vmcnt reports completion of load, atomic with
return and sample instructions in order, and the vscnt reports the completion of
store and atomic without return in order. See ``MEM_ORDERED`` field in
-:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
Wavefronts can be executed in WGP or CU wavefront execution mode:
@@ -11835,7 +11850,7 @@ Wavefronts can be executed in WGP or CU wavefront execution mode:
work-group synchronization.
See ``WGP_MODE`` field in
-:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table` and
+:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table` and
:ref:`amdgpu-target-features`.
The code sequences used to implement the memory model for GFX10-GFX11 are defined in
@@ -15375,123 +15390,125 @@ terminated by an ``.end_amdhsa_kernel`` directive.
======================================================== =================== ============ ===================
Directive Default Supported On Description
======================================================== =================== ============ ===================
- ``.amdhsa_group_segment_fixed_size`` 0 GFX6-GFX11 Controls GROUP_SEGMENT_FIXED_SIZE in
+ ``.amdhsa_group_segment_fixed_size`` 0 GFX6-GFX12 Controls GROUP_SEGMENT_FIXED_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_private_segment_fixed_size`` 0 GFX6-GFX11 Controls PRIVATE_SEGMENT_FIXED_SIZE in
+ ``.amdhsa_private_segment_fixed_size`` 0 GFX6-GFX12 Controls PRIVATE_SEGMENT_FIXED_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_kernarg_size`` 0 GFX6-GFX11 Controls KERNARG_SIZE in
+ ``.amdhsa_kernarg_size`` 0 GFX6-GFX12 Controls KERNARG_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_count`` 0 GFX6-GFX11 Controls USER_SGPR_COUNT in COMPUTE_PGM_RSRC2
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`
+ ``.amdhsa_user_sgpr_count`` 0 GFX6-GFX12 Controls USER_SGPR_COUNT in COMPUTE_PGM_RSRC2
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`
``.amdhsa_user_sgpr_private_segment_buffer`` 0 GFX6-GFX10 Controls ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER in
(except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
GFX940)
- ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_DISPATCH_PTR in
+ ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX12 Controls ENABLE_SGPR_DISPATCH_PTR in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_QUEUE_PTR in
+ ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX12 Controls ENABLE_SGPR_QUEUE_PTR in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_kernarg_segment_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in
+ ``.amdhsa_user_sgpr_kernarg_segment_ptr`` 0 GFX6-GFX12 Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_dispatch_id`` 0 GFX6-GFX11 Controls ENABLE_SGPR_DISPATCH_ID in
+ ``.amdhsa_user_sgpr_dispatch_id`` 0 GFX6-GFX12 Controls ENABLE_SGPR_DISPATCH_ID in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_user_sgpr_flat_scratch_init`` 0 GFX6-GFX10 Controls ENABLE_SGPR_FLAT_SCRATCH_INIT in
(except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
GFX940)
- ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX11 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
+ ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_wavefront_size32`` Target GFX10-GFX11 Controls ENABLE_WAVEFRONT_SIZE32 in
+ ``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in
Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
Specific
(wavefrontsize64)
- ``.amdhsa_uses_dynamic_stack`` 0 GFX6-GFX11 Controls USES_DYNAMIC_STACK in
+ ``.amdhsa_uses_dynamic_stack`` 0 GFX6-GFX12 Controls USES_DYNAMIC_STACK in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0 GFX6-GFX10 Controls ENABLE_PRIVATE_SEGMENT in
- (except :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ (except :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
GFX940)
``.amdhsa_enable_private_segment`` 0 GFX940, Controls ENABLE_PRIVATE_SEGMENT in
- GFX11 :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_system_sgpr_workgroup_id_x`` 1 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_X in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_system_sgpr_workgroup_id_y`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_Y in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_system_sgpr_workgroup_id_z`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_Z in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_system_sgpr_workgroup_info`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_INFO in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_system_vgpr_workitem_id`` 0 GFX6-GFX11 Controls ENABLE_VGPR_WORKITEM_ID in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/75030
More information about the llvm-commits
mailing list