[flang-commits] [compiler-rt] [lld] [lldb] [clang] [flang] [libcxx] [clang-tools-extra] [libc] [llvm] [AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (PR #75030)
Piotr Sobczak via flang-commits
flang-commits at lists.llvm.org
Wed Dec 13 03:34:26 PST 2023
https://github.com/piotrAMD updated https://github.com/llvm/llvm-project/pull/75030
>From fe45c2b633cceaaa4d6dc08d851e197a259b0aab Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin <Stanislav.Mekhanoshin at amd.com>
Date: Mon, 11 Dec 2023 09:13:46 +0100
Subject: [PATCH 1/4] [AMDGPU] Update IEEE and DX10_CLAMP for GFX12
---
llvm/docs/AMDGPUUsage.rst | 221 +++---
.../llvm/Support/AMDHSAKernelDescriptor.h | 12 +-
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 13 +-
.../Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | 4 +-
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | 2 +-
.../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 8 +-
.../AMDGPU/AMDGPUTargetTransformInfo.cpp | 6 +-
.../AMDGPU/AsmParser/AMDGPUAsmParser.cpp | 28 +-
.../Disassembler/AMDGPUDisassembler.cpp | 13 +-
llvm/lib/Target/AMDGPU/GCNSubtarget.h | 9 +
.../MCTargetDesc/AMDGPUTargetStreamer.cpp | 18 +-
llvm/lib/Target/AMDGPU/SIDefines.h | 3 +
.../Target/AMDGPU/SIMachineFunctionInfo.cpp | 2 +-
.../Target/AMDGPU/SIModeRegisterDefaults.cpp | 22 +-
.../Target/AMDGPU/SIModeRegisterDefaults.h | 4 +-
llvm/lib/Target/AMDGPU/SIProgramInfo.cpp | 41 +-
llvm/lib/Target/AMDGPU/SIProgramInfo.h | 7 +-
.../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | 15 +-
.../GlobalISel/clamp-fmed3-const-combine.ll | 53 ++
.../GlobalISel/clamp-minmax-const-combine.ll | 107 +++
.../GlobalISel/fmed3-min-max-const-combine.ll | 121 ++++
.../GlobalISel/llvm.amdgcn.rsq.clamp.ll | 89 +++
.../regbankcombiner-clamp-fmed3-const.mir | 75 +++
.../CodeGen/AMDGPU/amdpal-msgpack-ieee.ll | 8 +
llvm/test/CodeGen/AMDGPU/clamp.ll | 637 ++++++++++++++++++
llvm/test/MC/AMDGPU/hsa-gfx12-v4.s | 294 ++++++++
26 files changed, 1657 insertions(+), 155 deletions(-)
create mode 100644 llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7fb3d70bbeffeb..c7327623493e2d 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1211,10 +1211,12 @@ The AMDGPU backend supports the following LLVM IR attributes.
"amdgpu-flat-work-group-size" value, the implied occupancy
bounds by the workgroup size takes precedence.
- "amdgpu-ieee" true/false. Specify whether the function expects the IEEE field of the
+ "amdgpu-ieee" true/false. GFX6-GFX11 Only
+ Specify whether the function expects the IEEE field of the
mode register to be set on entry. Overrides the default for
the calling convention.
- "amdgpu-dx10-clamp" true/false. Specify whether the function expects the DX10_CLAMP field of
+ "amdgpu-dx10-clamp" true/false. GFX6-GFX11 Only
+ Specify whether the function expects the DX10_CLAMP field of
the mode register to be set on entry. Overrides the default
for the calling convention.
@@ -4390,21 +4392,21 @@ The fields used by CP for code objects before V3 also match those specified in
``COMPUTE_PGM_RSRC3``
configuration
register. See
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx12-table`.
415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC1``
configuration
register. See
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
447:416 4 bytes COMPUTE_PGM_RSRC2 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC2``
configuration
register. See
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
458:448 7 bits *See separate bits below.* Enable the setup of the
SGPR user data registers
(see
@@ -4472,8 +4474,8 @@ The fields used by CP for code objects before V3 also match those specified in
..
- .. table:: compute_pgm_rsrc1 for GFX6-GFX11
- :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table
+ .. table:: compute_pgm_rsrc1 for GFX6-GFX12
+ :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
@@ -4642,17 +4644,27 @@ The fields used by CP for code objects before V3 also match those specified in
CP is responsible for
filling in
``COMPUTE_PGM_RSRC1.PRIV``.
- 21 1 bit ENABLE_DX10_CLAMP Wavefront starts execution
- with DX10 clamp mode
- enabled. Used by the vector
- ALU to force DX10 style
- treatment of NaN's (when
- set, clamp NaN to zero,
- otherwise pass NaN
- through).
+ 21 1 bit ENABLE_DX10_CLAMP GFX9-GFX11
+ Wavefront starts execution
+ with DX10 clamp mode
+ enabled. Used by the vector
+ ALU to force DX10 style
+ treatment of NaN's (when
+ set, clamp NaN to zero,
+ otherwise pass NaN
+ through).
- Used by CP to set up
- ``COMPUTE_PGM_RSRC1.DX10_CLAMP``.
+ Used by CP to set up
+ ``COMPUTE_PGM_RSRC1.DX10_CLAMP``.
+ WG_RR_EN GFX12
+ If 1, wavefronts are scheduled
+ in a round-robin fashion with
+ respect to the other wavefronts
+ of the SIMD. Otherwise, wavefronts
+ are scheduled in oldest age order.
+
+ CP is responsible for filling in
+ ``COMPUTE_PGM_RSRC1.WG_RR_EN``.
22 1 bit DEBUG_MODE Must be 0.
Start executing wavefront
@@ -4661,21 +4673,24 @@ The fields used by CP for code objects before V3 also match those specified in
CP is responsible for
filling in
``COMPUTE_PGM_RSRC1.DEBUG_MODE``.
- 23 1 bit ENABLE_IEEE_MODE Wavefront starts execution
- with IEEE mode
- enabled. Floating point
- opcodes that support
- exception flag gathering
- will quiet and propagate
- signaling-NaN inputs per
- IEEE 754-2008. Min_dx10 and
- max_dx10 become IEEE
- 754-2008 compliant due to
- signaling-NaN propagation
- and quieting.
+ 23 1 bit ENABLE_IEEE_MODE GFX9-GFX11
+ Wavefront starts execution
+ with IEEE mode
+ enabled. Floating point
+ opcodes that support
+ exception flag gathering
+ will quiet and propagate
+ signaling-NaN inputs per
+ IEEE 754-2008. Min_dx10 and
+ max_dx10 become IEEE
+ 754-2008 compliant due to
+ signaling-NaN propagation
+ and quieting.
- Used by CP to set up
- ``COMPUTE_PGM_RSRC1.IEEE_MODE``.
+ Used by CP to set up
+ ``COMPUTE_PGM_RSRC1.IEEE_MODE``.
+ DISABLE_PERF GFX12
+ Reserved. Must be 0.
24 1 bit BULKY Must be 0.
Only one work-group allowed
@@ -4763,8 +4778,8 @@ The fields used by CP for code objects before V3 also match those specified in
..
- .. table:: compute_pgm_rsrc2 for GFX6-GFX11
- :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table
+ .. table:: compute_pgm_rsrc2 for GFX6-GFX12
+ :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
@@ -4957,8 +4972,8 @@ The fields used by CP for code objects before V3 also match those specified in
..
- .. table:: compute_pgm_rsrc3 for GFX10-GFX11
- :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table
+ .. table:: compute_pgm_rsrc3 for GFX10-GFX12
+ :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx12-table
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
@@ -5437,7 +5452,7 @@ There are different methods used for initializing flat scratch:
specifies *Architected flat scratch*:
If ENABLE_PRIVATE_SEGMENT is enabled in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table` then the FLAT_SCRATCH
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table` then the FLAT_SCRATCH
register pair will be initialized to the 64-bit address of the base of scratch
backing memory being managed by SPI for the queue executing the kernel
dispatch plus the value of the wave's Scratch Wavefront Offset for use as the
@@ -11819,7 +11834,7 @@ Wavefronts are executed in native mode with in-order reporting of loads and
sample instructions. In this mode vmcnt reports completion of load, atomic with
return and sample instructions in order, and the vscnt reports the completion of
store and atomic without return in order. See ``MEM_ORDERED`` field in
-:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
Wavefronts can be executed in WGP or CU wavefront execution mode:
@@ -11835,7 +11850,7 @@ Wavefronts can be executed in WGP or CU wavefront execution mode:
work-group synchronization.
See ``WGP_MODE`` field in
-:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table` and
+:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table` and
:ref:`amdgpu-target-features`.
The code sequences used to implement the memory model for GFX10-GFX11 are defined in
@@ -15375,123 +15390,125 @@ terminated by an ``.end_amdhsa_kernel`` directive.
======================================================== =================== ============ ===================
Directive Default Supported On Description
======================================================== =================== ============ ===================
- ``.amdhsa_group_segment_fixed_size`` 0 GFX6-GFX11 Controls GROUP_SEGMENT_FIXED_SIZE in
+ ``.amdhsa_group_segment_fixed_size`` 0 GFX6-GFX12 Controls GROUP_SEGMENT_FIXED_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_private_segment_fixed_size`` 0 GFX6-GFX11 Controls PRIVATE_SEGMENT_FIXED_SIZE in
+ ``.amdhsa_private_segment_fixed_size`` 0 GFX6-GFX12 Controls PRIVATE_SEGMENT_FIXED_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_kernarg_size`` 0 GFX6-GFX11 Controls KERNARG_SIZE in
+ ``.amdhsa_kernarg_size`` 0 GFX6-GFX12 Controls KERNARG_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_count`` 0 GFX6-GFX11 Controls USER_SGPR_COUNT in COMPUTE_PGM_RSRC2
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`
+ ``.amdhsa_user_sgpr_count`` 0 GFX6-GFX12 Controls USER_SGPR_COUNT in COMPUTE_PGM_RSRC2
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`
``.amdhsa_user_sgpr_private_segment_buffer`` 0 GFX6-GFX10 Controls ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER in
(except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
GFX940)
- ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_DISPATCH_PTR in
+ ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX12 Controls ENABLE_SGPR_DISPATCH_PTR in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_QUEUE_PTR in
+ ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX12 Controls ENABLE_SGPR_QUEUE_PTR in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_kernarg_segment_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in
+ ``.amdhsa_user_sgpr_kernarg_segment_ptr`` 0 GFX6-GFX12 Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_user_sgpr_dispatch_id`` 0 GFX6-GFX11 Controls ENABLE_SGPR_DISPATCH_ID in
+ ``.amdhsa_user_sgpr_dispatch_id`` 0 GFX6-GFX12 Controls ENABLE_SGPR_DISPATCH_ID in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_user_sgpr_flat_scratch_init`` 0 GFX6-GFX10 Controls ENABLE_SGPR_FLAT_SCRATCH_INIT in
(except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
GFX940)
- ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX11 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
+ ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_wavefront_size32`` Target GFX10-GFX11 Controls ENABLE_WAVEFRONT_SIZE32 in
+ ``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in
Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
Specific
(wavefrontsize64)
- ``.amdhsa_uses_dynamic_stack`` 0 GFX6-GFX11 Controls USES_DYNAMIC_STACK in
+ ``.amdhsa_uses_dynamic_stack`` 0 GFX6-GFX12 Controls USES_DYNAMIC_STACK in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0 GFX6-GFX10 Controls ENABLE_PRIVATE_SEGMENT in
- (except :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ (except :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
GFX940)
``.amdhsa_enable_private_segment`` 0 GFX940, Controls ENABLE_PRIVATE_SEGMENT in
- GFX11 :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_system_sgpr_workgroup_id_x`` 1 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_X in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_system_sgpr_workgroup_id_y`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_Y in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_system_sgpr_workgroup_id_z`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_Z in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_system_sgpr_workgroup_info`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_INFO in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_system_vgpr_workitem_id`` 0 GFX6-GFX11 Controls ENABLE_VGPR_WORKITEM_ID in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ GFX11-GFX12 :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
+ ``.amdhsa_system_sgpr_workgroup_id_x`` 1 GFX6-GFX12 Controls ENABLE_SGPR_WORKGROUP_ID_X in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
+ ``.amdhsa_system_sgpr_workgroup_id_y`` 0 GFX6-GFX12 Controls ENABLE_SGPR_WORKGROUP_ID_Y in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
+ ``.amdhsa_system_sgpr_workgroup_id_z`` 0 GFX6-GFX12 Controls ENABLE_SGPR_WORKGROUP_ID_Z in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
+ ``.amdhsa_system_sgpr_workgroup_info`` 0 GFX6-GFX12 Controls ENABLE_SGPR_WORKGROUP_INFO in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
+ ``.amdhsa_system_vgpr_workitem_id`` 0 GFX6-GFX12 Controls ENABLE_VGPR_WORKITEM_ID in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
Possible values are defined in
:ref:`amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table`.
- ``.amdhsa_next_free_vgpr`` Required GFX6-GFX11 Maximum VGPR number explicitly referenced, plus one.
+ ``.amdhsa_next_free_vgpr`` Required GFX6-GFX12 Maximum VGPR number explicitly referenced, plus one.
Used to calculate GRANULATED_WORKITEM_VGPR_COUNT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
- ``.amdhsa_next_free_sgpr`` Required GFX6-GFX11 Maximum SGPR number explicitly referenced, plus one.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
+ ``.amdhsa_next_free_sgpr`` Required GFX6-GFX12 Maximum SGPR number explicitly referenced, plus one.
Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
``.amdhsa_accum_offset`` Required GFX90A, Offset of a first AccVGPR in the unified register file.
GFX940 Used to calculate ACCUM_OFFSET in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`.
- ``.amdhsa_reserve_vcc`` 1 GFX6-GFX11 Whether the kernel may use the special VCC SGPR.
+ ``.amdhsa_reserve_vcc`` 1 GFX6-GFX12 Whether the kernel may use the special VCC SGPR.
Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
``.amdhsa_reserve_flat_scratch`` 1 GFX7-GFX10 Whether the kernel may use flat instructions to access
(except scratch memory. Used to calculate
GFX940) GRANULATED_WAVEFRONT_SGPR_COUNT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
``.amdhsa_reserve_xnack_mask`` Target GFX8-GFX10 Whether the kernel may trigger XNACK replay.
Feature Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
- Specific :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ Specific :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
(xnack)
- ``.amdhsa_float_round_mode_32`` 0 GFX6-GFX11 Controls FLOAT_ROUND_MODE_32 in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ ``.amdhsa_float_round_mode_32`` 0 GFX6-GFX12 Controls FLOAT_ROUND_MODE_32 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
Possible values are defined in
:ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
- ``.amdhsa_float_round_mode_16_64`` 0 GFX6-GFX11 Controls FLOAT_ROUND_MODE_16_64 in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ ``.amdhsa_float_round_mode_16_64`` 0 GFX6-GFX12 Controls FLOAT_ROUND_MODE_16_64 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
Possible values are defined in
:ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
- ``.amdhsa_float_denorm_mode_32`` 0 GFX6-GFX11 Controls FLOAT_DENORM_MODE_32 in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ ``.amdhsa_float_denorm_mode_32`` 0 GFX6-GFX12 Controls FLOAT_DENORM_MODE_32 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
Possible values are defined in
:ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
- ``.amdhsa_float_denorm_mode_16_64`` 3 GFX6-GFX11 Controls FLOAT_DENORM_MODE_16_64 in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ ``.amdhsa_float_denorm_mode_16_64`` 3 GFX6-GFX12 Controls FLOAT_DENORM_MODE_16_64 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
Possible values are defined in
:ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
``.amdhsa_dx10_clamp`` 1 GFX6-GFX11 Controls ENABLE_DX10_CLAMP in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
``.amdhsa_ieee_mode`` 1 GFX6-GFX11 Controls ENABLE_IEEE_MODE in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
- ``.amdhsa_fp16_overflow`` 0 GFX9-GFX11 Controls FP16_OVFL in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
+ ``.amdhsa_round_robin_scheduling`` 0 GFX12 Controls ENABLE_WG_RR_EN in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
+ ``.amdhsa_fp16_overflow`` 0 GFX9-GFX12 Controls FP16_OVFL in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
``.amdhsa_tg_split`` Target GFX90A, Controls TG_SPLIT in
Feature GFX940, :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`.
- Specific GFX11
+ Specific GFX11-GFX12
(tgsplit)
- ``.amdhsa_workgroup_processor_mode`` Target GFX10-GFX11 Controls ENABLE_WGP_MODE in
+ ``.amdhsa_workgroup_processor_mode`` Target GFX10-GFX12 Controls ENABLE_WGP_MODE in
Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
Specific
(cumode)
- ``.amdhsa_memory_ordered`` 1 GFX10-GFX11 Controls MEM_ORDERED in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
- ``.amdhsa_forward_progress`` 0 GFX10-GFX11 Controls FWD_PROGRESS in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`.
+ ``.amdhsa_memory_ordered`` 1 GFX10-GFX12 Controls MEM_ORDERED in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
+ ``.amdhsa_forward_progress`` 0 GFX10-GFX12 Controls FWD_PROGRESS in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
``.amdhsa_shared_vgpr_count`` 0 GFX10-GFX11 Controls SHARED_VGPR_COUNT in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table`.
- ``.amdhsa_exception_fp_ieee_invalid_op`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_exception_fp_denorm_src`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_exception_fp_ieee_div_zero`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_exception_fp_ieee_overflow`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_exception_fp_ieee_underflow`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_exception_fp_ieee_inexact`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_INEXACT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
- ``.amdhsa_exception_int_div_zero`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
+ ``.amdhsa_exception_fp_ieee_invalid_op`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
+ ``.amdhsa_exception_fp_denorm_src`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
+ ``.amdhsa_exception_fp_ieee_div_zero`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
+ ``.amdhsa_exception_fp_ieee_overflow`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
+ ``.amdhsa_exception_fp_ieee_underflow`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
+ ``.amdhsa_exception_fp_ieee_inexact`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_IEEE_754_FP_INEXACT in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
+ ``.amdhsa_exception_int_div_zero`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
``.amdhsa_user_sgpr_kernarg_preload_length`` 0 GFX90A, Controls KERNARG_PRELOAD_SPEC_LENGTH in
GFX940 :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_user_sgpr_kernarg_preload_offset`` 0 GFX90A, Controls KERNARG_PRELOAD_SPEC_OFFSET in
diff --git a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
index 0574f96e6e15c4..ba650681824171 100644
--- a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
+++ b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
@@ -88,12 +88,18 @@ enum : uint8_t {
// [GFX6-GFX9].
#define COMPUTE_PGM_RSRC1_GFX6_GFX9(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX6_GFX9_ ## NAME, SHIFT, WIDTH)
+// [GFX6-GFX11].
+#define COMPUTE_PGM_RSRC1_GFX6_GFX11(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX6_GFX11_ ## NAME, SHIFT, WIDTH)
// GFX9+.
#define COMPUTE_PGM_RSRC1_GFX9_PLUS(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX9_PLUS_ ## NAME, SHIFT, WIDTH)
// GFX10+.
#define COMPUTE_PGM_RSRC1_GFX10_PLUS(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX10_PLUS_ ## NAME, SHIFT, WIDTH)
+// GFX12+.
+#define COMPUTE_PGM_RSRC1_GFX12_PLUS(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX12_PLUS_ ## NAME, SHIFT, WIDTH)
enum : int32_t {
COMPUTE_PGM_RSRC1(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6),
COMPUTE_PGM_RSRC1(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4),
@@ -103,9 +109,11 @@ enum : int32_t {
COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_32, 16, 2),
COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_16_64, 18, 2),
COMPUTE_PGM_RSRC1(PRIV, 20, 1),
- COMPUTE_PGM_RSRC1(ENABLE_DX10_CLAMP, 21, 1),
+ COMPUTE_PGM_RSRC1_GFX6_GFX11(ENABLE_DX10_CLAMP, 21, 1),
+ COMPUTE_PGM_RSRC1_GFX12_PLUS(ENABLE_WG_RR_EN, 21, 1),
COMPUTE_PGM_RSRC1(DEBUG_MODE, 22, 1),
- COMPUTE_PGM_RSRC1(ENABLE_IEEE_MODE, 23, 1),
+ COMPUTE_PGM_RSRC1_GFX6_GFX11(ENABLE_IEEE_MODE, 23, 1),
+ COMPUTE_PGM_RSRC1_GFX12_PLUS(DISABLE_PERF, 23, 1),
COMPUTE_PGM_RSRC1(BULKY, 24, 1),
COMPUTE_PGM_RSRC1(CDBG_USER, 25, 1),
COMPUTE_PGM_RSRC1_GFX6_GFX8(RESERVED0, 26, 1),
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 4bf1f1357b694e..d317a733d4331c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -426,7 +426,7 @@ amdhsa::kernel_descriptor_t AMDGPUAsmPrinter::getAmdhsaKernelDescriptor(
memset(&KernelDescriptor, 0x0, sizeof(KernelDescriptor));
assert(isUInt<32>(PI.ScratchSize));
- assert(isUInt<32>(PI.getComputePGMRSrc1()));
+ assert(isUInt<32>(PI.getComputePGMRSrc1(STM)));
assert(isUInt<32>(PI.getComputePGMRSrc2()));
KernelDescriptor.group_segment_fixed_size = PI.LDSSize;
@@ -435,7 +435,7 @@ amdhsa::kernel_descriptor_t AMDGPUAsmPrinter::getAmdhsaKernelDescriptor(
Align MaxKernArgAlign;
KernelDescriptor.kernarg_size = STM.getKernArgSegmentSize(F, MaxKernArgAlign);
- KernelDescriptor.compute_pgm_rsrc1 = PI.getComputePGMRSrc1();
+ KernelDescriptor.compute_pgm_rsrc1 = PI.getComputePGMRSrc1(STM);
KernelDescriptor.compute_pgm_rsrc2 = PI.getComputePGMRSrc2();
KernelDescriptor.kernel_code_properties = getAmdhsaKernelCodeProperties(MF);
@@ -974,7 +974,7 @@ void AMDGPUAsmPrinter::EmitProgramInfoSI(const MachineFunction &MF,
if (AMDGPU::isCompute(MF.getFunction().getCallingConv())) {
OutStreamer->emitInt32(R_00B848_COMPUTE_PGM_RSRC1);
- OutStreamer->emitInt32(CurrentProgramInfo.getComputePGMRSrc1());
+ OutStreamer->emitInt32(CurrentProgramInfo.getComputePGMRSrc1(STM));
OutStreamer->emitInt32(R_00B84C_COMPUTE_PGM_RSRC2);
OutStreamer->emitInt32(CurrentProgramInfo.getComputePGMRSrc2());
@@ -1038,7 +1038,7 @@ void AMDGPUAsmPrinter::EmitPALMetadata(const MachineFunction &MF,
MD->setNumUsedSgprs(CC, CurrentProgramInfo.NumSGPRsForWavesPerEU);
if (MD->getPALMajorVersion() < 3) {
- MD->setRsrc1(CC, CurrentProgramInfo.getPGMRSrc1(CC));
+ MD->setRsrc1(CC, CurrentProgramInfo.getPGMRSrc1(CC, STM));
if (AMDGPU::isCompute(CC)) {
MD->setRsrc2(CC, CurrentProgramInfo.getComputePGMRSrc2());
} else {
@@ -1116,10 +1116,11 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const MachineFunction &MF) {
const MachineFrameInfo &MFI = MF.getFrameInfo();
StringRef FnName = MF.getFunction().getName();
MD->setFunctionScratchSize(FnName, MFI.getStackSize());
+ const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
// Set compute registers
MD->setRsrc1(CallingConv::AMDGPU_CS,
- CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
+ CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS, ST));
MD->setRsrc2(CallingConv::AMDGPU_CS, CurrentProgramInfo.getComputePGMRSrc2());
// Set optional info
@@ -1155,7 +1156,7 @@ void AMDGPUAsmPrinter::getAmdKernelCode(amd_kernel_code_t &Out,
AMDGPU::initDefaultAMDKernelCodeT(Out, &STM);
Out.compute_pgm_resource_registers =
- CurrentProgramInfo.getComputePGMRSrc1() |
+ CurrentProgramInfo.getComputePGMRSrc1(STM) |
(CurrentProgramInfo.getComputePGMRSrc2() << 32);
Out.code_properties |= AMD_CODE_PROPERTY_IS_PTR64;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
index 4caa9cd9225b69..87b1957c799e2c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
@@ -2199,7 +2199,7 @@ bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {
auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>();
Impl.DT = DTWP ? &DTWP->getDomTree() : nullptr;
Impl.HasUnsafeFPMath = hasUnsafeFPMath(F);
- SIModeRegisterDefaults Mode(F);
+ SIModeRegisterDefaults Mode(F, *Impl.ST);
Impl.HasFP32DenormalFlush =
Mode.FP32Denormals == DenormalMode::getPreserveSign();
return Impl.run(F);
@@ -2216,7 +2216,7 @@ PreservedAnalyses AMDGPUCodeGenPreparePass::run(Function &F,
Impl.UA = &FAM.getResult<UniformityInfoAnalysis>(F);
Impl.DT = FAM.getCachedResult<DominatorTreeAnalysis>(F);
Impl.HasUnsafeFPMath = hasUnsafeFPMath(F);
- SIModeRegisterDefaults Mode(F);
+ SIModeRegisterDefaults Mode(F, *Impl.ST);
Impl.HasFP32DenormalFlush =
Mode.FP32Denormals == DenormalMode::getPreserveSign();
PreservedAnalyses PA = PreservedAnalyses::none();
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index a6d1da94b89078..66ba08ef0dc12a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -132,7 +132,7 @@ bool AMDGPUDAGToDAGISel::runOnMachineFunction(MachineFunction &MF) {
}
#endif
Subtarget = &MF.getSubtarget<GCNSubtarget>();
- Mode = SIModeRegisterDefaults(MF.getFunction());
+ Mode = SIModeRegisterDefaults(MF.getFunction(), *Subtarget);
return SelectionDAGISel::runOnMachineFunction(MF);
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 0c38fa32c6f33a..c59368a0933c57 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -1499,13 +1499,13 @@ bool GCNTargetMachine::parseMachineFunctionInfo(
static_cast<const yaml::SIMachineFunctionInfo &>(MFI_);
MachineFunction &MF = PFS.MF;
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
+ const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
if (MFI->initializeBaseYamlFields(YamlMFI, MF, PFS, Error, SourceRange))
return true;
if (MFI->Occupancy == 0) {
// Fixup the subtarget dependent default value.
- const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
MFI->Occupancy = ST.computeOccupancy(MF.getFunction(), MFI->getLDSSize());
}
@@ -1659,8 +1659,10 @@ bool GCNTargetMachine::parseMachineFunctionInfo(
MFI->ArgInfo.WorkItemIDZ, 0, 0)))
return true;
- MFI->Mode.IEEE = YamlMFI.Mode.IEEE;
- MFI->Mode.DX10Clamp = YamlMFI.Mode.DX10Clamp;
+ if (ST.hasIEEEMode())
+ MFI->Mode.IEEE = YamlMFI.Mode.IEEE;
+ if (ST.hasDX10ClampMode())
+ MFI->Mode.DX10Clamp = YamlMFI.Mode.DX10Clamp;
// FIXME: Move proper support for denormal-fp-math into base MachineFunction
MFI->Mode.FP32Denormals.Input = YamlMFI.Mode.FP32InputDenormals
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index cb877a4695f1ec..8155a8e00bc416 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -296,7 +296,7 @@ GCNTTIImpl::GCNTTIImpl(const AMDGPUTargetMachine *TM, const Function &F)
ST(static_cast<const GCNSubtarget *>(TM->getSubtargetImpl(F))),
TLI(ST->getTargetLowering()), CommonTTI(TM, F),
IsGraphics(AMDGPU::isGraphics(F.getCallingConv())) {
- SIModeRegisterDefaults Mode(F);
+ SIModeRegisterDefaults Mode(F, *ST);
HasFP32Denormals = Mode.FP32Denormals != DenormalMode::getPreserveSign();
HasFP64FP16Denormals =
Mode.FP64FP16Denormals != DenormalMode::getPreserveSign();
@@ -1163,8 +1163,8 @@ bool GCNTTIImpl::areInlineCompatible(const Function *Caller,
// FIXME: dx10_clamp can just take the caller setting, but there seems to be
// no way to support merge for backend defined attributes.
- SIModeRegisterDefaults CallerMode(*Caller);
- SIModeRegisterDefaults CalleeMode(*Callee);
+ SIModeRegisterDefaults CallerMode(*Caller, *CallerST);
+ SIModeRegisterDefaults CalleeMode(*Callee, *CalleeST);
if (!CallerMode.isInlineCompatible(CalleeMode))
return false;
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 092845d391a3b0..d2015c615846ca 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -5334,10 +5334,16 @@ bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() {
COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_16_64, Val,
ValRange);
} else if (ID == ".amdhsa_dx10_clamp") {
+ if (IVersion.Major >= 12)
+ return Error(IDRange.Start, "directive unsupported on gfx12+", IDRange);
PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1,
- COMPUTE_PGM_RSRC1_ENABLE_DX10_CLAMP, Val, ValRange);
+ COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_DX10_CLAMP,
+ Val, ValRange);
} else if (ID == ".amdhsa_ieee_mode") {
- PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1, COMPUTE_PGM_RSRC1_ENABLE_IEEE_MODE,
+ if (IVersion.Major >= 12)
+ return Error(IDRange.Start, "directive unsupported on gfx12+", IDRange);
+ PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1,
+ COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_IEEE_MODE,
Val, ValRange);
} else if (ID == ".amdhsa_fp16_overflow") {
if (IVersion.Major < 9)
@@ -5401,6 +5407,12 @@ bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() {
PARSE_BITS_ENTRY(KD.compute_pgm_rsrc2,
COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO,
Val, ValRange);
+ } else if (ID == ".amdhsa_round_robin_scheduling") {
+ if (IVersion.Major < 12)
+ return Error(IDRange.Start, "directive requires gfx12+", IDRange);
+ PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1,
+ COMPUTE_PGM_RSRC1_GFX12_PLUS_ENABLE_WG_RR_EN,
+ Val, ValRange);
} else {
return Error(IDRange.Start, "unknown .amdhsa_kernel directive", IDRange);
}
@@ -5554,6 +5566,18 @@ bool AMDGPUAsmParser::ParseAMDKernelCodeTValue(StringRef ID,
}
Lex();
+ if (ID == "enable_dx10_clamp") {
+ if (G_00B848_DX10_CLAMP(Header.compute_pgm_resource_registers) &&
+ isGFX12Plus())
+ return TokError("enable_dx10_clamp=1 is not allowed on GFX12+");
+ }
+
+ if (ID == "enable_ieee_mode") {
+ if (G_00B848_IEEE_MODE(Header.compute_pgm_resource_registers) &&
+ isGFX12Plus())
+ return TokError("enable_ieee_mode=1 is not allowed on GFX12+");
+ }
+
if (ID == "enable_wavefront_size32") {
if (Header.code_properties & AMD_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32) {
if (!isGFX10Plus())
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 1f11beb71101bc..20b31afcbfa686 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -1836,12 +1836,16 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC1(
if (FourByteBuffer & COMPUTE_PGM_RSRC1_PRIV)
return MCDisassembler::Fail;
- PRINT_DIRECTIVE(".amdhsa_dx10_clamp", COMPUTE_PGM_RSRC1_ENABLE_DX10_CLAMP);
+ if (!isGFX12Plus())
+ PRINT_DIRECTIVE(".amdhsa_dx10_clamp",
+ COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_DX10_CLAMP);
if (FourByteBuffer & COMPUTE_PGM_RSRC1_DEBUG_MODE)
return MCDisassembler::Fail;
- PRINT_DIRECTIVE(".amdhsa_ieee_mode", COMPUTE_PGM_RSRC1_ENABLE_IEEE_MODE);
+ if (!isGFX12Plus())
+ PRINT_DIRECTIVE(".amdhsa_ieee_mode",
+ COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_IEEE_MODE);
if (FourByteBuffer & COMPUTE_PGM_RSRC1_BULKY)
return MCDisassembler::Fail;
@@ -1867,6 +1871,11 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC1(
PRINT_DIRECTIVE(".amdhsa_memory_ordered", COMPUTE_PGM_RSRC1_GFX10_PLUS_MEM_ORDERED);
PRINT_DIRECTIVE(".amdhsa_forward_progress", COMPUTE_PGM_RSRC1_GFX10_PLUS_FWD_PROGRESS);
}
+
+ if (isGFX12Plus())
+ PRINT_DIRECTIVE(".amdhsa_round_robin_scheduling",
+ COMPUTE_PGM_RSRC1_GFX12_PLUS_ENABLE_WG_RR_EN);
+
return MCDisassembler::Success;
}
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index 94b9e49b765a6f..b48e7f3959a4ec 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -1210,6 +1210,15 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
// \returns true is CSUB atomics support a no-return form.
bool hasAtomicCSubNoRtnInsts() const { return HasAtomicCSubNoRtnInsts; }
+ // \returns true if the target has DX10_CLAMP kernel descriptor mode bit
+ bool hasDX10ClampMode() const { return getGeneration() < GFX12; }
+
+ // \returns true if the target has IEEE kernel descriptor mode bit
+ bool hasIEEEMode() const { return getGeneration() < GFX12; }
+
+ // \returns true if the target has WG_RR_MODE kernel descriptor mode bit
+ bool hasRrWGMode() const { return getGeneration() >= GFX12; }
+
/// \returns SGPR allocation granularity supported by the subtarget.
unsigned getSGPRAllocGranule() const {
return AMDGPU::IsaInfo::getSGPRAllocGranule(this);
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index eba8e49a46f82f..693127c3e89709 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -451,12 +451,14 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
PRINT_FIELD(OS, ".amdhsa_float_denorm_mode_16_64", KD,
compute_pgm_rsrc1,
amdhsa::COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_16_64);
- PRINT_FIELD(OS, ".amdhsa_dx10_clamp", KD,
- compute_pgm_rsrc1,
- amdhsa::COMPUTE_PGM_RSRC1_ENABLE_DX10_CLAMP);
- PRINT_FIELD(OS, ".amdhsa_ieee_mode", KD,
- compute_pgm_rsrc1,
- amdhsa::COMPUTE_PGM_RSRC1_ENABLE_IEEE_MODE);
+ if (IVersion.Major < 12) {
+ PRINT_FIELD(OS, ".amdhsa_dx10_clamp", KD,
+ compute_pgm_rsrc1,
+ amdhsa::COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_DX10_CLAMP);
+ PRINT_FIELD(OS, ".amdhsa_ieee_mode", KD,
+ compute_pgm_rsrc1,
+ amdhsa::COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_IEEE_MODE);
+ }
if (IVersion.Major >= 9)
PRINT_FIELD(OS, ".amdhsa_fp16_overflow", KD,
compute_pgm_rsrc1,
@@ -478,6 +480,10 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
PRINT_FIELD(OS, ".amdhsa_shared_vgpr_count", KD, compute_pgm_rsrc3,
amdhsa::COMPUTE_PGM_RSRC3_GFX10_PLUS_SHARED_VGPR_COUNT);
}
+ if (IVersion.Major >= 12)
+ PRINT_FIELD(OS, ".amdhsa_round_robin_scheduling", KD,
+ compute_pgm_rsrc1,
+ amdhsa::COMPUTE_PGM_RSRC1_GFX12_PLUS_ENABLE_WG_RR_EN);
PRINT_FIELD(
OS, ".amdhsa_exception_fp_ieee_invalid_op", KD,
compute_pgm_rsrc2,
diff --git a/llvm/lib/Target/AMDGPU/SIDefines.h b/llvm/lib/Target/AMDGPU/SIDefines.h
index 47dc59e77dc4e7..a63db6c22e0698 100644
--- a/llvm/lib/Target/AMDGPU/SIDefines.h
+++ b/llvm/lib/Target/AMDGPU/SIDefines.h
@@ -1117,6 +1117,9 @@ enum Register_Flag : uint8_t {
#define S_00B848_DX10_CLAMP(x) (((x) & 0x1) << 21)
#define G_00B848_DX10_CLAMP(x) (((x) >> 21) & 0x1)
#define C_00B848_DX10_CLAMP 0xFFDFFFFF
+#define S_00B848_RR_WG_MODE(x) (((x) & 0x1) << 21)
+#define G_00B848_RR_WG_MODE(x) (((x) >> 21) & 0x1)
+#define C_00B848_RR_WG_MODE 0xFFDFFFFF
#define S_00B848_DEBUG_MODE(x) (((x) & 0x1) << 22)
#define G_00B848_DEBUG_MODE(x) (((x) >> 22) & 0x1)
#define C_00B848_DEBUG_MODE 0xFFBFFFFF
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
index f8eb67199f623e..6fbf175c336efe 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
@@ -37,7 +37,7 @@ const GCNTargetMachine &getTM(const GCNSubtarget *STI) {
SIMachineFunctionInfo::SIMachineFunctionInfo(const Function &F,
const GCNSubtarget *STI)
- : AMDGPUMachineFunction(F, *STI), Mode(F), GWSResourcePSV(getTM(STI)),
+ : AMDGPUMachineFunction(F, *STI), Mode(F, *STI), GWSResourcePSV(getTM(STI)),
UserSGPRInfo(F, *STI), WorkGroupIDX(false), WorkGroupIDY(false),
WorkGroupIDZ(false), WorkGroupInfo(false), LDSKernelId(false),
PrivateSegmentWaveByteOffset(false), WorkItemIDX(false),
diff --git a/llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.cpp b/llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.cpp
index ffed268244eda0..2684a1e3c3358a 100644
--- a/llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.cpp
+++ b/llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.cpp
@@ -7,20 +7,26 @@
//===----------------------------------------------------------------------===//
#include "SIModeRegisterDefaults.h"
+#include "GCNSubtarget.h"
using namespace llvm;
-SIModeRegisterDefaults::SIModeRegisterDefaults(const Function &F) {
+SIModeRegisterDefaults::SIModeRegisterDefaults(const Function &F,
+ const GCNSubtarget &ST) {
*this = getDefaultForCallingConv(F.getCallingConv());
- StringRef IEEEAttr = F.getFnAttribute("amdgpu-ieee").getValueAsString();
- if (!IEEEAttr.empty())
- IEEE = IEEEAttr == "true";
+ if (ST.hasIEEEMode()) {
+ StringRef IEEEAttr = F.getFnAttribute("amdgpu-ieee").getValueAsString();
+ if (!IEEEAttr.empty())
+ IEEE = IEEEAttr == "true";
+ }
- StringRef DX10ClampAttr =
- F.getFnAttribute("amdgpu-dx10-clamp").getValueAsString();
- if (!DX10ClampAttr.empty())
- DX10Clamp = DX10ClampAttr == "true";
+ if (ST.hasDX10ClampMode()) {
+ StringRef DX10ClampAttr =
+ F.getFnAttribute("amdgpu-dx10-clamp").getValueAsString();
+ if (!DX10ClampAttr.empty())
+ DX10Clamp = DX10ClampAttr == "true";
+ }
StringRef DenormF32Attr =
F.getFnAttribute("denormal-fp-math-f32").getValueAsString();
diff --git a/llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.h b/llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.h
index 58e2c67c248bd7..9fbd74c3eede32 100644
--- a/llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.h
+++ b/llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.h
@@ -14,6 +14,8 @@
namespace llvm {
+class GCNSubtarget;
+
// Track defaults for fields in the MODE register.
struct SIModeRegisterDefaults {
/// Floating point opcodes that support exception flag gathering quiet and
@@ -40,7 +42,7 @@ struct SIModeRegisterDefaults {
FP32Denormals(DenormalMode::getIEEE()),
FP64FP16Denormals(DenormalMode::getIEEE()) {}
- SIModeRegisterDefaults(const Function &F);
+ SIModeRegisterDefaults(const Function &F, const GCNSubtarget &ST);
static SIModeRegisterDefaults getDefaultForCallingConv(CallingConv::ID CC) {
SIModeRegisterDefaults Mode;
diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
index b6839c8308d8e8..e0763ba9997def 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
@@ -14,28 +14,49 @@
//===----------------------------------------------------------------------===//
//
+#include "GCNSubtarget.h"
#include "SIProgramInfo.h"
#include "SIDefines.h"
#include "Utils/AMDGPUBaseInfo.h"
using namespace llvm;
-uint64_t SIProgramInfo::getComputePGMRSrc1() const {
- return S_00B848_VGPRS(VGPRBlocks) | S_00B848_SGPRS(SGPRBlocks) |
- S_00B848_PRIORITY(Priority) | S_00B848_FLOAT_MODE(FloatMode) |
- S_00B848_PRIV(Priv) | S_00B848_DX10_CLAMP(DX10Clamp) |
- S_00B848_DEBUG_MODE(DebugMode) | S_00B848_IEEE_MODE(IEEEMode) |
- S_00B848_WGP_MODE(WgpMode) | S_00B848_MEM_ORDERED(MemOrdered);
+uint64_t SIProgramInfo::getComputePGMRSrc1(const GCNSubtarget &ST) const {
+ uint64_t Reg = S_00B848_VGPRS(VGPRBlocks) | S_00B848_SGPRS(SGPRBlocks) |
+ S_00B848_PRIORITY(Priority) | S_00B848_FLOAT_MODE(FloatMode) |
+ S_00B848_PRIV(Priv) | S_00B848_DEBUG_MODE(DebugMode) |
+ S_00B848_WGP_MODE(WgpMode) | S_00B848_MEM_ORDERED(MemOrdered);
+
+ if (ST.hasDX10ClampMode())
+ Reg |= S_00B848_DX10_CLAMP(DX10Clamp);
+
+ if (ST.hasIEEEMode())
+ Reg |= S_00B848_IEEE_MODE(IEEEMode);
+
+ if (ST.hasRrWGMode())
+ Reg |= S_00B848_RR_WG_MODE(RrWgMode);
+
+ return Reg;
}
-uint64_t SIProgramInfo::getPGMRSrc1(CallingConv::ID CC) const {
+uint64_t SIProgramInfo::getPGMRSrc1(CallingConv::ID CC,
+ const GCNSubtarget &ST) const {
if (AMDGPU::isCompute(CC)) {
- return getComputePGMRSrc1();
+ return getComputePGMRSrc1(ST);
}
uint64_t Reg = S_00B848_VGPRS(VGPRBlocks) | S_00B848_SGPRS(SGPRBlocks) |
S_00B848_PRIORITY(Priority) | S_00B848_FLOAT_MODE(FloatMode) |
- S_00B848_PRIV(Priv) | S_00B848_DX10_CLAMP(DX10Clamp) |
- S_00B848_DEBUG_MODE(DebugMode) | S_00B848_IEEE_MODE(IEEEMode);
+ S_00B848_PRIV(Priv) | S_00B848_DEBUG_MODE(DebugMode);
+
+ if (ST.hasDX10ClampMode())
+ Reg |= S_00B848_DX10_CLAMP(DX10Clamp);
+
+ if (ST.hasIEEEMode())
+ Reg |= S_00B848_IEEE_MODE(IEEEMode);
+
+ if (ST.hasRrWGMode())
+ Reg |= S_00B848_RR_WG_MODE(RrWgMode);
+
switch (CC) {
case CallingConv::AMDGPU_PS:
Reg |= S_00B028_MEM_ORDERED(MemOrdered);
diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.h b/llvm/lib/Target/AMDGPU/SIProgramInfo.h
index aab127e494630a..a178a75c448a89 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.h
@@ -21,6 +21,8 @@
namespace llvm {
+class GCNSubtarget;
+
/// Track resource usage for kernels / entry functions.
struct SIProgramInfo {
// Fields set in PGM_RSRC1 pm4 packet.
@@ -34,6 +36,7 @@ struct SIProgramInfo {
uint32_t IEEEMode = 0;
uint32_t WgpMode = 0; // GFX10+
uint32_t MemOrdered = 0; // GFX10+
+ uint32_t RrWgMode = 0; // GFX12+
uint64_t ScratchSize = 0;
// State used to calculate fields set in PGM_RSRC2 pm4 packet.
@@ -85,8 +88,8 @@ struct SIProgramInfo {
SIProgramInfo() = default;
/// Compute the value of the ComputePGMRsrc1 register.
- uint64_t getComputePGMRSrc1() const;
- uint64_t getPGMRSrc1(CallingConv::ID CC) const;
+ uint64_t getComputePGMRSrc1(const GCNSubtarget &ST) const;
+ uint64_t getPGMRSrc1(CallingConv::ID CC, const GCNSubtarget &ST) const;
/// Compute the value of the ComputePGMRsrc2 register.
uint64_t getComputePGMRSrc2() const;
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 68d561a0d9f780..93007286a511d2 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -1131,10 +1131,17 @@ amdhsa::kernel_descriptor_t getDefaultAmdhsaKernelDescriptor(
AMDHSA_BITS_SET(KD.compute_pgm_rsrc1,
amdhsa::COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_16_64,
amdhsa::FLOAT_DENORM_MODE_FLUSH_NONE);
- AMDHSA_BITS_SET(KD.compute_pgm_rsrc1,
- amdhsa::COMPUTE_PGM_RSRC1_ENABLE_DX10_CLAMP, 1);
- AMDHSA_BITS_SET(KD.compute_pgm_rsrc1,
- amdhsa::COMPUTE_PGM_RSRC1_ENABLE_IEEE_MODE, 1);
+ if (Version.Major >= 12) {
+ AMDHSA_BITS_SET(KD.compute_pgm_rsrc1,
+ amdhsa::COMPUTE_PGM_RSRC1_GFX12_PLUS_ENABLE_WG_RR_EN, 0);
+ AMDHSA_BITS_SET(KD.compute_pgm_rsrc1,
+ amdhsa::COMPUTE_PGM_RSRC1_GFX12_PLUS_DISABLE_PERF, 0);
+ } else {
+ AMDHSA_BITS_SET(KD.compute_pgm_rsrc1,
+ amdhsa::COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_DX10_CLAMP, 1);
+ AMDHSA_BITS_SET(KD.compute_pgm_rsrc1,
+ amdhsa::COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_IEEE_MODE, 1);
+ }
AMDHSA_BITS_SET(KD.compute_pgm_rsrc2,
amdhsa::COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_X, 1);
if (Version.Major >= 10) {
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-fmed3-const-combine.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-fmed3-const-combine.ll
index ea46f4d2d419e8..62e5bce23664cd 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-fmed3-const-combine.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-fmed3-const-combine.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -global-isel -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX10 %s
+; RUN: llc -global-isel -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1200 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX12 %s
define float @test_fmed3_f32_known_nnan_ieee_true(float %a) #0 {
; GFX10-LABEL: test_fmed3_f32_known_nnan_ieee_true:
@@ -7,6 +8,12 @@ define float @test_fmed3_f32_known_nnan_ieee_true(float %a) #0 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_fmed3_f32_known_nnan_ieee_true:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul float %a, 2.0
%fmed = call nnan float @llvm.amdgcn.fmed3.f32(float %fmul, float 0.0, float 1.0)
ret float %fmed
@@ -18,6 +25,12 @@ define half @test_fmed3_f16_known_nnan_ieee_false(half %a) #1 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mul_f16_e64 v0, v0, 2.0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_fmed3_f16_known_nnan_ieee_false:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f16_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul half %a, 2.0
%fmed = call nnan half @llvm.amdgcn.fmed3.f16(half %fmul, half 0.0, half 1.0)
ret half %fmed
@@ -31,6 +44,14 @@ define float @test_fmed3_non_SNaN_input_ieee_true_dx10clamp_true(float %a) #2 {
; GFX10-NEXT: v_max_f32_e32 v0, v0, v0
; GFX10-NEXT: v_min_f32_e64 v0, 0x41200000, v0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_fmed3_non_SNaN_input_ieee_true_dx10clamp_true:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e32 v0, v0, v0
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_min_num_f32_e64 v0, 0x41200000, v0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmin = call float @llvm.minnum.f32(float %a, float 10.0)
%fmed = call float @llvm.amdgcn.fmed3.f32(float %fmin, float 0.0, float 1.0)
ret float %fmed
@@ -43,6 +64,12 @@ define float @test_fmed3_maybe_SNaN_input_zero_third_operand_ieee_true_dx10clamp
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_fmed3_maybe_SNaN_input_zero_third_operand_ieee_true_dx10clamp_true:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul float %a, 2.0
%fmed = call float @llvm.amdgcn.fmed3.f32(float %fmul, float 1.0, float 0.0)
ret float %fmed
@@ -56,6 +83,12 @@ define float @test_fmed3_global_nnan(float %a) #3 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_fmed3_global_nnan:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul float %a, 2.0
%fmed = call float @llvm.amdgcn.fmed3.f32(float %fmul, float 0.0, float 1.0)
ret float %fmed
@@ -73,6 +106,12 @@ define float @test_fmed3_f32_maybe_NaN_ieee_false(float %a) #1 {
; GFX10-NEXT: v_mul_f32_e32 v0, 2.0, v0
; GFX10-NEXT: v_med3_f32 v0, v0, 1.0, 0
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_fmed3_f32_maybe_NaN_ieee_false:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul float %a, 2.0
%fmed = call float @llvm.amdgcn.fmed3.f32(float %fmul, float 1.0, float 0.0)
ret float %fmed
@@ -87,6 +126,14 @@ define float @test_fmed3_non_SNaN_input_ieee_true_dx10clamp_false(float %a) #4 {
; GFX10-NEXT: v_min_f32_e32 v0, 0x41200000, v0
; GFX10-NEXT: v_med3_f32 v0, v0, 0, 1.0
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_fmed3_non_SNaN_input_ieee_true_dx10clamp_false:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e32 v0, v0, v0
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_min_num_f32_e64 v0, 0x41200000, v0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmin = call float @llvm.minnum.f32(float %a, float 10.0)
%fmed = call float @llvm.amdgcn.fmed3.f32(float %fmin, float 0.0, float 1.0)
ret float %fmed
@@ -99,6 +146,12 @@ define float @test_fmed3_maybe_SNaN_input_ieee_true_dx10clamp_true(float %a) #2
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_fmed3_maybe_SNaN_input_ieee_true_dx10clamp_true:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul float %a, 2.0
%fmed = call float @llvm.amdgcn.fmed3.f32(float %fmul, float 0.0, float 1.0)
ret float %fmed
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-minmax-const-combine.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-minmax-const-combine.ll
index 4f75d205cda35c..bba3687dbbc2cc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-minmax-const-combine.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-minmax-const-combine.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -global-isel -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX10 %s
+; RUN: llc -global-isel -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1200 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX12 %s
define float @test_min_max_ValK0_K1_f32(float %a) #0 {
; GFX10-LABEL: test_min_max_ValK0_K1_f32:
@@ -7,6 +8,12 @@ define float @test_min_max_ValK0_K1_f32(float %a) #0 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_ValK0_K1_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul float %a, 2.0
%maxnum = call nnan float @llvm.maxnum.f32(float %fmul, float 0.0)
%fmed = call nnan float @llvm.minnum.f32(float %maxnum, float 1.0)
@@ -19,6 +26,12 @@ define double @test_min_max_K0Val_K1_f64(double %a) #1 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mul_f64 v[0:1], v[0:1], 2.0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_K0Val_K1_f64:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f64_e64 v[0:1], v[0:1], 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul double %a, 2.0
%maxnum = call nnan double @llvm.maxnum.f64(double 0.0, double %fmul)
%fmed = call nnan double @llvm.minnum.f64(double %maxnum, double 1.0)
@@ -32,6 +45,12 @@ define half @test_min_K1max_ValK0_f16(half %a) #2 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mul_f16_e64 v0, v0, 2.0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_K1max_ValK0_f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f16_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul half %a, 2.0
%maxnum = call half @llvm.maxnum.f16(half %fmul, half 0.0)
%fmed = call half @llvm.minnum.f16(half 1.0, half %maxnum)
@@ -44,6 +63,12 @@ define <2 x half> @test_min_K1max_K0Val_f16(<2 x half> %a) #1 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_pk_mul_f16 v0, v0, 2.0 op_sel_hi:[1,0] clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_K1max_K0Val_f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_pk_mul_f16 v0, v0, 2.0 op_sel_hi:[1,0] clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul <2 x half> %a, <half 2.0, half 2.0>
%maxnum = call nnan <2 x half> @llvm.maxnum.v2f16(<2 x half> <half 0.0, half 0.0>, <2 x half> %fmul)
%fmed = call nnan <2 x half> @llvm.minnum.v2f16(<2 x half> <half 1.0, half 1.0>, <2 x half> %maxnum)
@@ -56,6 +81,12 @@ define <2 x half> @test_min_max_splat_padded_with_undef(<2 x half> %a) #2 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_pk_mul_f16 v0, v0, 2.0 op_sel_hi:[1,0] clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_splat_padded_with_undef:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_pk_mul_f16 v0, v0, 2.0 op_sel_hi:[1,0] clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul <2 x half> %a, <half 2.0, half 2.0>
%maxnum = call <2 x half> @llvm.maxnum.v2f16(<2 x half> <half 0.0, half undef>, <2 x half> %fmul)
%fmed = call <2 x half> @llvm.minnum.v2f16(<2 x half> <half 1.0, half undef>, <2 x half> %maxnum)
@@ -70,6 +101,12 @@ define float @test_max_min_ValK1_K0_f32(float %a) #0 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_ValK1_K0_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul float %a, 2.0
%minnum = call nnan float @llvm.minnum.f32(float %fmul, float 1.0)
%fmed = call nnan float @llvm.maxnum.f32(float %minnum, float 0.0)
@@ -82,6 +119,12 @@ define double @test_max_min_K1Val_K0_f64(double %a) #1 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mul_f64 v[0:1], v[0:1], 2.0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_K1Val_K0_f64:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f64_e64 v[0:1], v[0:1], 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul double %a, 2.0
%minnum = call nnan double @llvm.minnum.f64(double 1.0, double %fmul)
%fmed = call nnan double @llvm.maxnum.f64(double %minnum, double 0.0)
@@ -94,6 +137,12 @@ define half @test_max_K0min_ValK1_f16(half %a) #0 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mul_f16_e64 v0, v0, 2.0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_K0min_ValK1_f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f16_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul half %a, 2.0
%minnum = call nnan half @llvm.minnum.f16(half %fmul, half 1.0)
%fmed = call nnan half @llvm.maxnum.f16(half 0.0, half %minnum)
@@ -107,6 +156,12 @@ define <2 x half> @test_max_K0min_K1Val_v2f16(<2 x half> %a) #1 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_pk_mul_f16 v0, v0, 2.0 op_sel_hi:[1,0] clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_K0min_K1Val_v2f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_pk_mul_f16 v0, v0, 2.0 op_sel_hi:[1,0] clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul <2 x half> %a, <half 2.0, half 2.0>
%minnum = call nnan <2 x half> @llvm.minnum.v2f16(<2 x half> <half 1.0, half undef>, <2 x half> %fmul)
%fmed = call nnan <2 x half> @llvm.maxnum.v2f16(<2 x half> <half undef, half 0.0>, <2 x half> %minnum)
@@ -121,6 +176,12 @@ define float @test_min_max_global_nnan(float %a) #3 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_max_f32_e64 v0, v0, v0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_global_nnan:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v0, v0, v0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call float @llvm.maxnum.f32(float %a, float 0.0)
%fmed = call float @llvm.minnum.f32(float %maxnum, float 1.0)
ret float %fmed
@@ -132,6 +193,12 @@ define float @test_max_min_global_nnan(float %a) #3 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_max_f32_e64 v0, v0, v0 clamp
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_global_nnan:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v0, v0, v0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%minnum = call float @llvm.minnum.f32(float %a, float 1.0)
%fmed = call float @llvm.maxnum.f32(float %minnum, float 0.0)
ret float %fmed
@@ -149,6 +216,12 @@ define float @test_min_max_K0_gt_K1(float %a) #0 {
; GFX10-NEXT: v_max_f32_e32 v0, 1.0, v0
; GFX10-NEXT: v_min_f32_e32 v0, 0, v0
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_K0_gt_K1:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_maxmin_num_f32 v0, v0, 1.0, 0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call nnan float @llvm.maxnum.f32(float %a, float 1.0)
%fmed = call nnan float @llvm.minnum.f32(float %maxnum, float 0.0)
ret float %fmed
@@ -162,6 +235,12 @@ define float @test_max_min_K0_gt_K1(float %a) #0 {
; GFX10-NEXT: v_min_f32_e32 v0, 0, v0
; GFX10-NEXT: v_max_f32_e32 v0, 1.0, v0
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_K0_gt_K1:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_minmax_num_f32 v0, v0, 0, 1.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%minnum = call nnan float @llvm.minnum.f32(float %a, float 0.0)
%fmed = call nnan float @llvm.maxnum.f32(float %minnum, float 1.0)
ret float %fmed
@@ -178,6 +257,12 @@ define float @test_min_max_maybe_NaN_input_ieee_false(float %a) #1 {
; GFX10-NEXT: v_max_f32_e32 v0, 0, v0
; GFX10-NEXT: v_min_f32_e32 v0, 1.0, v0
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_maybe_NaN_input_ieee_false:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul float %a, 2.0
%maxnum = call float @llvm.maxnum.f32(float %fmul, float 0.0)
%fmed = call float @llvm.minnum.f32(float %maxnum, float 1.0)
@@ -192,6 +277,12 @@ define float @test_min_max_maybe_NaN_input_ieee_true_dx10clamp_false(float %a) #
; GFX10-NEXT: v_mul_f32_e32 v0, 2.0, v0
; GFX10-NEXT: v_med3_f32 v0, v0, 0, 1.0
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_maybe_NaN_input_ieee_true_dx10clamp_false:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f32_e64 v0, v0, 2.0 clamp
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul float %a, 2.0
%maxnum = call float @llvm.maxnum.f32(float %fmul, float 0.0)
%fmed = call float @llvm.minnum.f32(float %maxnum, float 1.0)
@@ -208,6 +299,14 @@ define float @test_max_min_maybe_NaN_input_ieee_true(float %a) #0 {
; GFX10-NEXT: v_min_f32_e32 v0, 1.0, v0
; GFX10-NEXT: v_max_f32_e32 v0, 0, v0
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_maybe_NaN_input_ieee_true:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f32_e32 v0, 2.0, v0
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_minmax_num_f32 v0, v0, 1.0, 0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul float %a, 2.0
%minnum = call float @llvm.minnum.f32(float %fmul, float 1.0)
%fmed = call float @llvm.maxnum.f32(float %minnum, float 0.0)
@@ -222,6 +321,14 @@ define float @test_max_min_maybe_NaN_input_ieee_false(float %a) #1 {
; GFX10-NEXT: v_min_f32_e32 v0, 1.0, v0
; GFX10-NEXT: v_max_f32_e32 v0, 0, v0
; GFX10-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_maybe_NaN_input_ieee_false:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_mul_f32_e32 v0, 2.0, v0
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_minmax_num_f32 v0, v0, 1.0, 0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul float %a, 2.0
%minnum = call float @llvm.minnum.f32(float %fmul, float 1.0)
%fmed = call float @llvm.maxnum.f32(float %minnum, float 0.0)
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fmed3-min-max-const-combine.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fmed3-min-max-const-combine.ll
index d6d36fe1acf35f..096ca5bc8705f5 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fmed3-min-max-const-combine.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fmed3-min-max-const-combine.ll
@@ -1,6 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -global-isel -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX10 %s
; RUN: llc -global-isel -mtriple=amdgcn-amd-mesa3d -mcpu=gfx803 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX8 %s
+; RUN: llc -global-isel -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1200 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX12 %s
define float @test_min_max_ValK0_K1_f32(float %a) #0 {
; GFX10-LABEL: test_min_max_ValK0_K1_f32:
@@ -14,6 +15,12 @@ define float @test_min_max_ValK0_K1_f32(float %a) #0 {
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_med3_f32 v0, v0, 2.0, 4.0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_ValK0_K1_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_med3_num_f32 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call nnan float @llvm.maxnum.f32(float %a, float 2.0)
%fmed = call nnan float @llvm.minnum.f32(float %maxnum, float 4.0)
ret float %fmed
@@ -31,6 +38,12 @@ define float @test_min_max_K0Val_K1_f32(float %a) #1 {
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_med3_f32 v0, v0, 2.0, 4.0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_K0Val_K1_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_med3_num_f32 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call nnan float @llvm.maxnum.f32(float 2.0, float %a)
%fmed = call nnan float @llvm.minnum.f32(float %maxnum, float 4.0)
ret float %fmed
@@ -53,6 +66,14 @@ define half @test_min_K1max_ValK0_f16(half %a) #0 {
; GFX8-NEXT: v_max_f16_e32 v0, 2.0, v0
; GFX8-NEXT: v_min_f16_e32 v0, 4.0, v0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_K1max_ValK0_f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_max_num_f16_e32 v0, v0, v0
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_med3_num_f16 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call half @llvm.maxnum.f16(half %a, half 2.0)
%fmed = call half @llvm.minnum.f16(half 4.0, half %maxnum)
ret half %fmed
@@ -71,6 +92,12 @@ define half @test_min_K1max_K0Val_f16(half %a) #1 {
; GFX8-NEXT: v_max_f16_e32 v0, 2.0, v0
; GFX8-NEXT: v_min_f16_e32 v0, 4.0, v0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_K1max_K0Val_f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_med3_num_f16 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call nnan half @llvm.maxnum.f16(half 2.0, half %a)
%fmed = call nnan half @llvm.minnum.f16(half 4.0, half %maxnum)
ret half %fmed
@@ -89,6 +116,12 @@ define float @test_max_min_ValK1_K0_f32(float %a) #0 {
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_med3_f32 v0, v0, 2.0, 4.0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_ValK1_K0_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_med3_num_f32 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%minnum = call nnan float @llvm.minnum.f32(float %a, float 4.0)
%fmed = call nnan float @llvm.maxnum.f32(float %minnum, float 2.0)
ret float %fmed
@@ -106,6 +139,12 @@ define float @test_max_min_K1Val_K0_f32(float %a) #1 {
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_med3_f32 v0, v0, 2.0, 4.0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_K1Val_K0_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_med3_num_f32 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%minnum = call nnan float @llvm.minnum.f32(float 4.0, float %a)
%fmed = call nnan float @llvm.maxnum.f32(float %minnum, float 2.0)
ret float %fmed
@@ -124,6 +163,12 @@ define half @test_max_K0min_ValK1_f16(half %a) #0 {
; GFX8-NEXT: v_min_f16_e32 v0, 4.0, v0
; GFX8-NEXT: v_max_f16_e32 v0, 2.0, v0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_K0min_ValK1_f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_med3_num_f16 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%minnum = call nnan half @llvm.minnum.f16(half %a, half 4.0)
%fmed = call nnan half @llvm.maxnum.f16(half 2.0, half %minnum)
ret half %fmed
@@ -142,6 +187,12 @@ define half @test_max_K0min_K1Val_f16(half %a) #1 {
; GFX8-NEXT: v_min_f16_e32 v0, 4.0, v0
; GFX8-NEXT: v_max_f16_e32 v0, 2.0, v0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_K0min_K1Val_f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_med3_num_f16 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%minnum = call nnan half @llvm.minnum.f16(half 4.0, half %a)
%fmed = call nnan half @llvm.maxnum.f16(half 2.0, half %minnum)
ret half %fmed
@@ -161,6 +212,12 @@ define float @test_min_max_global_nnan(float %a) #2 {
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_med3_f32 v0, v0, 2.0, 4.0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_global_nnan:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_med3_num_f32 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call float @llvm.maxnum.f32(float %a, float 2.0)
%fmed = call float @llvm.minnum.f32(float %maxnum, float 4.0)
ret float %fmed
@@ -178,6 +235,12 @@ define float @test_max_min_global_nnan(float %a) #2 {
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_med3_f32 v0, v0, 2.0, 4.0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_global_nnan:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_med3_num_f32 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%minnum = call float @llvm.minnum.f32(float %a, float 4.0)
%fmed = call float @llvm.maxnum.f32(float %minnum, float 2.0)
ret float %fmed
@@ -202,6 +265,12 @@ define float @test_min_max_K0_gt_K1(float %a) #0 {
; GFX8-NEXT: v_max_f32_e32 v0, 4.0, v0
; GFX8-NEXT: v_min_f32_e32 v0, 2.0, v0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_K0_gt_K1:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_maxmin_num_f32 v0, v0, 4.0, 2.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call nnan float @llvm.maxnum.f32(float %a, float 4.0)
%fmed = call nnan float @llvm.minnum.f32(float %maxnum, float 2.0)
ret float %fmed
@@ -222,6 +291,12 @@ define float @test_max_min_K0_gt_K1(float %a) #0 {
; GFX8-NEXT: v_min_f32_e32 v0, 2.0, v0
; GFX8-NEXT: v_max_f32_e32 v0, 4.0, v0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_K0_gt_K1:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_minmax_num_f32 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%minnum = call nnan float @llvm.minnum.f32(float %a, float 2.0)
%fmed = call nnan float @llvm.maxnum.f32(float %minnum, float 4.0)
ret float %fmed
@@ -242,6 +317,12 @@ define float @test_min_max_non_inline_const(float %a) #0 {
; GFX8-NEXT: v_max_f32_e32 v0, 2.0, v0
; GFX8-NEXT: v_min_f32_e32 v0, 0x41000000, v0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_non_inline_const:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_maxmin_num_f32 v0, v0, 2.0, 0x41000000
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call nnan float @llvm.maxnum.f32(float %a, float 2.0)
%fmed = call nnan float @llvm.minnum.f32(float %maxnum, float 8.0)
ret float %fmed
@@ -263,6 +344,14 @@ define double @test_min_max_f64(double %a) #0 {
; GFX8-NEXT: v_max_f64 v[0:1], v[0:1], 2.0
; GFX8-NEXT: v_min_f64 v[0:1], v[0:1], 4.0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_f64:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_max_num_f64_e32 v[0:1], 2.0, v[0:1]
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_min_num_f64_e32 v[0:1], 4.0, v[0:1]
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call nnan double @llvm.maxnum.f64(double %a, double 2.0)
%fmed = call nnan double @llvm.minnum.f64(double %maxnum, double 4.0)
ret double %fmed
@@ -287,6 +376,14 @@ define <2 x half> @test_min_max_v2f16(<2 x half> %a) #0 {
; GFX8-NEXT: v_min_f16_sdwa v0, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_or_b32_e32 v0, v1, v0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_v2f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_pk_max_num_f16 v0, v0, 2.0 op_sel_hi:[1,0]
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_pk_min_num_f16 v0, v0, 4.0 op_sel_hi:[1,0]
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call nnan <2 x half> @llvm.maxnum.v2f16(<2 x half> %a, <2 x half> <half 2.0, half 2.0>)
%fmed = call nnan <2 x half> @llvm.minnum.v2f16(<2 x half> %maxnum, <2 x half> <half 4.0, half 4.0>)
ret <2 x half> %fmed
@@ -309,6 +406,14 @@ define float @test_min_max_maybe_NaN_input_ieee_false(float %a) #1 {
; GFX8-NEXT: v_max_f32_e32 v0, 2.0, v0
; GFX8-NEXT: v_min_f32_e32 v0, 4.0, v0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_min_max_maybe_NaN_input_ieee_false:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e32 v0, v0, v0
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_med3_num_f32 v0, v0, 2.0, 4.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%maxnum = call float @llvm.maxnum.f32(float %a, float 2.0)
%fmed = call float @llvm.minnum.f32(float %maxnum, float 4.0)
ret float %fmed
@@ -330,6 +435,14 @@ define float @test_max_min_maybe_NaN_input_ieee_false(float %a) #1 {
; GFX8-NEXT: v_min_f32_e32 v0, 4.0, v0
; GFX8-NEXT: v_max_f32_e32 v0, 2.0, v0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_maybe_NaN_input_ieee_false:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e32 v0, v0, v0
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_minmax_num_f32 v0, v0, 4.0, 2.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%minnum = call float @llvm.minnum.f32(float %a, float 4.0)
%fmed = call float @llvm.maxnum.f32(float %minnum, float 2.0)
ret float %fmed
@@ -352,6 +465,14 @@ define float @test_max_min_maybe_NaN_input_ieee_true(float %a) #0 {
; GFX8-NEXT: v_min_f32_e32 v0, 4.0, v0
; GFX8-NEXT: v_max_f32_e32 v0, 2.0, v0
; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_max_min_maybe_NaN_input_ieee_true:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e32 v0, v0, v0
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_minmax_num_f32 v0, v0, 4.0, 2.0
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%minnum = call float @llvm.minnum.f32(float %a, float 4.0)
%fmed = call float @llvm.maxnum.f32(float %minnum, float 2.0)
ret float %fmed
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.rsq.clamp.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.rsq.clamp.ll
index bd570df3d83b3f..ed298796937c71 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.rsq.clamp.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.rsq.clamp.ll
@@ -1,6 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -global-isel -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s
; RUN: llc -global-isel -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck -check-prefix=VI %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1200 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX12 %s
define float @v_rsq_clamp_f32(float %src) #0 {
; SI-LABEL: v_rsq_clamp_f32:
@@ -16,6 +17,15 @@ define float @v_rsq_clamp_f32(float %src) #0 {
; VI-NEXT: v_min_f32_e32 v0, 0x7f7fffff, v0
; VI-NEXT: v_max_f32_e32 v0, 0xff7fffff, v0
; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: v_rsq_clamp_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_rsq_f32_e32 v0, v0
+; GFX12-NEXT: v_mov_b32_e32 v1, 0xff7fffff
+; GFX12-NEXT: s_delay_alu instid0(TRANS32_DEP_1) | instid1(VALU_DEP_1)
+; GFX12-NEXT: v_minmax_num_f32 v0, v0, 0x7f7fffff, v1
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%rsq_clamp = call float @llvm.amdgcn.rsq.clamp.f32(float %src)
ret float %rsq_clamp
}
@@ -34,6 +44,15 @@ define float @v_rsq_clamp_fabs_f32(float %src) #0 {
; VI-NEXT: v_min_f32_e32 v0, 0x7f7fffff, v0
; VI-NEXT: v_max_f32_e32 v0, 0xff7fffff, v0
; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: v_rsq_clamp_fabs_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_rsq_f32_e64 v0, |v0|
+; GFX12-NEXT: v_mov_b32_e32 v1, 0xff7fffff
+; GFX12-NEXT: s_delay_alu instid0(TRANS32_DEP_1) | instid1(VALU_DEP_1)
+; GFX12-NEXT: v_minmax_num_f32 v0, v0, 0x7f7fffff, v1
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fabs.src = call float @llvm.fabs.f32(float %src)
%rsq_clamp = call float @llvm.amdgcn.rsq.clamp.f32(float %fabs.src)
ret float %rsq_clamp
@@ -56,6 +75,19 @@ define double @v_rsq_clamp_f64(double %src) #0 {
; VI-NEXT: s_mov_b32 s5, 0xffefffff
; VI-NEXT: v_max_f64 v[0:1], v[0:1], s[4:5]
; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: v_rsq_clamp_f64:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_rsq_f64_e32 v[0:1], v[0:1]
+; GFX12-NEXT: s_mov_b32 s0, -1
+; GFX12-NEXT: s_mov_b32 s1, 0x7fefffff
+; GFX12-NEXT: s_delay_alu instid0(TRANS32_DEP_1) | instid1(SALU_CYCLE_1)
+; GFX12-NEXT: v_min_num_f64_e32 v[0:1], s[0:1], v[0:1]
+; GFX12-NEXT: s_mov_b32 s1, 0xffefffff
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
+; GFX12-NEXT: v_max_num_f64_e32 v[0:1], s[0:1], v[0:1]
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%rsq_clamp = call double @llvm.amdgcn.rsq.clamp.f64(double %src)
ret double %rsq_clamp
}
@@ -77,6 +109,19 @@ define double @v_rsq_clamp_fabs_f64(double %src) #0 {
; VI-NEXT: s_mov_b32 s5, 0xffefffff
; VI-NEXT: v_max_f64 v[0:1], v[0:1], s[4:5]
; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: v_rsq_clamp_fabs_f64:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_rsq_f64_e64 v[0:1], |v[0:1]|
+; GFX12-NEXT: s_mov_b32 s0, -1
+; GFX12-NEXT: s_mov_b32 s1, 0x7fefffff
+; GFX12-NEXT: s_delay_alu instid0(TRANS32_DEP_1) | instid1(SALU_CYCLE_1)
+; GFX12-NEXT: v_min_num_f64_e32 v[0:1], s[0:1], v[0:1]
+; GFX12-NEXT: s_mov_b32 s1, 0xffefffff
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
+; GFX12-NEXT: v_max_num_f64_e32 v[0:1], s[0:1], v[0:1]
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%fabs.src = call double @llvm.fabs.f64(double %src)
%rsq_clamp = call double @llvm.amdgcn.rsq.clamp.f64(double %fabs.src)
ret double %rsq_clamp
@@ -96,6 +141,15 @@ define float @v_rsq_clamp_undef_f32() #0 {
; VI-NEXT: v_min_f32_e32 v0, 0x7f7fffff, v0
; VI-NEXT: v_max_f32_e32 v0, 0xff7fffff, v0
; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: v_rsq_clamp_undef_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_rsq_f32_e32 v0, s0
+; GFX12-NEXT: v_mov_b32_e32 v1, 0xff7fffff
+; GFX12-NEXT: s_delay_alu instid0(TRANS32_DEP_1) | instid1(VALU_DEP_1)
+; GFX12-NEXT: v_minmax_num_f32 v0, v0, 0x7f7fffff, v1
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%rsq_clamp = call float @llvm.amdgcn.rsq.clamp.f32(float undef)
ret float %rsq_clamp
}
@@ -117,6 +171,19 @@ define double @v_rsq_clamp_undef_f64() #0 {
; VI-NEXT: s_mov_b32 s5, 0xffefffff
; VI-NEXT: v_max_f64 v[0:1], v[0:1], s[4:5]
; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: v_rsq_clamp_undef_f64:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_rsq_f64_e32 v[0:1], s[0:1]
+; GFX12-NEXT: s_mov_b32 s0, -1
+; GFX12-NEXT: s_mov_b32 s1, 0x7fefffff
+; GFX12-NEXT: s_delay_alu instid0(TRANS32_DEP_1) | instid1(SALU_CYCLE_1)
+; GFX12-NEXT: v_min_num_f64_e32 v[0:1], s[0:1], v[0:1]
+; GFX12-NEXT: s_mov_b32 s1, 0xffefffff
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
+; GFX12-NEXT: v_max_num_f64_e32 v[0:1], s[0:1], v[0:1]
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%rsq_clamp = call double @llvm.amdgcn.rsq.clamp.f64(double undef)
ret double %rsq_clamp
}
@@ -135,6 +202,15 @@ define float @v_rsq_clamp_f32_non_ieee(float %src) #2 {
; VI-NEXT: v_min_f32_e32 v0, 0x7f7fffff, v0
; VI-NEXT: v_max_f32_e32 v0, 0xff7fffff, v0
; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: v_rsq_clamp_f32_non_ieee:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_rsq_f32_e32 v0, v0
+; GFX12-NEXT: v_mov_b32_e32 v1, 0xff7fffff
+; GFX12-NEXT: s_delay_alu instid0(TRANS32_DEP_1) | instid1(VALU_DEP_1)
+; GFX12-NEXT: v_minmax_num_f32 v0, v0, 0x7f7fffff, v1
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%rsq_clamp = call float @llvm.amdgcn.rsq.clamp.f32(float %src)
ret float %rsq_clamp
}
@@ -156,6 +232,19 @@ define double @v_rsq_clamp_f64_non_ieee(double %src) #2 {
; VI-NEXT: s_mov_b32 s5, 0xffefffff
; VI-NEXT: v_max_f64 v[0:1], v[0:1], s[4:5]
; VI-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: v_rsq_clamp_f64_non_ieee:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT: v_rsq_f64_e32 v[0:1], v[0:1]
+; GFX12-NEXT: s_mov_b32 s0, -1
+; GFX12-NEXT: s_mov_b32 s1, 0x7fefffff
+; GFX12-NEXT: s_delay_alu instid0(TRANS32_DEP_1) | instid1(SALU_CYCLE_1)
+; GFX12-NEXT: v_min_num_f64_e32 v[0:1], s[0:1], v[0:1]
+; GFX12-NEXT: s_mov_b32 s1, 0xffefffff
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
+; GFX12-NEXT: v_max_num_f64_e32 v[0:1], s[0:1], v[0:1]
+; GFX12-NEXT: s_setpc_b64 s[30:31]
%rsq_clamp = call double @llvm.amdgcn.rsq.clamp.f64(double %src)
ret double %rsq_clamp
}
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir
index 95f55a6d800ece..a97d905f2a978c 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir
@@ -1,5 +1,6 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1200 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s --check-prefix=GFX12
---
name: test_fmed3_f32_known_nnan_ieee_true
legalized: true
@@ -22,6 +23,16 @@ body: |
; CHECK-NEXT: [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY]], [[COPY1]]
; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = nnan G_AMDGPU_CLAMP [[FMUL]]
; CHECK-NEXT: $vgpr0 = COPY [[AMDGPU_CLAMP]](s32)
+ ;
+ ; GFX12-LABEL: name: test_fmed3_f32_known_nnan_ieee_true
+ ; GFX12: liveins: $vgpr0
+ ; GFX12-NEXT: {{ $}}
+ ; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
+ ; GFX12-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 2.000000e+00
+ ; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+ ; GFX12-NEXT: [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY]], [[COPY1]]
+ ; GFX12-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = nnan G_AMDGPU_CLAMP [[FMUL]]
+ ; GFX12-NEXT: $vgpr0 = COPY [[AMDGPU_CLAMP]](s32)
%0:vgpr(s32) = COPY $vgpr0
%2:sgpr(s32) = G_FCONSTANT float 2.000000e+00
%8:vgpr(s32) = COPY %2(s32)
@@ -58,6 +69,18 @@ body: |
; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s16) = nnan G_AMDGPU_CLAMP [[FMUL]]
; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:vgpr(s32) = G_ANYEXT [[AMDGPU_CLAMP]](s16)
; CHECK-NEXT: $vgpr0 = COPY [[ANYEXT]](s32)
+ ;
+ ; GFX12-LABEL: name: test_fmed3_f16_known_nnan_ieee_false
+ ; GFX12: liveins: $vgpr0
+ ; GFX12-NEXT: {{ $}}
+ ; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
+ ; GFX12-NEXT: [[TRUNC:%[0-9]+]]:vgpr(s16) = G_TRUNC [[COPY]](s32)
+ ; GFX12-NEXT: [[C:%[0-9]+]]:sgpr(s16) = G_FCONSTANT half 0xH4000
+ ; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr(s16) = COPY [[C]](s16)
+ ; GFX12-NEXT: [[FMUL:%[0-9]+]]:vgpr(s16) = G_FMUL [[TRUNC]], [[COPY1]]
+ ; GFX12-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s16) = nnan G_AMDGPU_CLAMP [[FMUL]]
+ ; GFX12-NEXT: [[ANYEXT:%[0-9]+]]:vgpr(s32) = G_ANYEXT [[AMDGPU_CLAMP]](s16)
+ ; GFX12-NEXT: $vgpr0 = COPY [[ANYEXT]](s32)
%2:vgpr(s32) = COPY $vgpr0
%0:vgpr(s16) = G_TRUNC %2(s32)
%3:sgpr(s16) = G_FCONSTANT half 0xH4000
@@ -95,6 +118,17 @@ body: |
; CHECK-NEXT: [[FMINNUM_IEEE:%[0-9]+]]:vgpr(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[COPY1]]
; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMINNUM_IEEE]]
; CHECK-NEXT: $vgpr0 = COPY [[AMDGPU_CLAMP]](s32)
+ ;
+ ; GFX12-LABEL: name: test_fmed3_non_SNaN_input_ieee_true_dx10clamp_true
+ ; GFX12: liveins: $vgpr0
+ ; GFX12-NEXT: {{ $}}
+ ; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
+ ; GFX12-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 1.000000e+01
+ ; GFX12-NEXT: [[FCANONICALIZE:%[0-9]+]]:vgpr(s32) = G_FCANONICALIZE [[COPY]]
+ ; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+ ; GFX12-NEXT: [[FMINNUM_IEEE:%[0-9]+]]:vgpr(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[COPY1]]
+ ; GFX12-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMINNUM_IEEE]]
+ ; GFX12-NEXT: $vgpr0 = COPY [[AMDGPU_CLAMP]](s32)
%0:vgpr(s32) = COPY $vgpr0
%2:sgpr(s32) = G_FCONSTANT float 1.000000e+01
%8:vgpr(s32) = G_FCANONICALIZE %0
@@ -130,6 +164,16 @@ body: |
; CHECK-NEXT: [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY]], [[COPY1]]
; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMUL]]
; CHECK-NEXT: $vgpr0 = COPY [[AMDGPU_CLAMP]](s32)
+ ;
+ ; GFX12-LABEL: name: test_fmed3_maybe_SNaN_input_zero_third_operand_ieee_true_dx10clamp_true
+ ; GFX12: liveins: $vgpr0
+ ; GFX12-NEXT: {{ $}}
+ ; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
+ ; GFX12-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 2.000000e+00
+ ; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+ ; GFX12-NEXT: [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY]], [[COPY1]]
+ ; GFX12-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMUL]]
+ ; GFX12-NEXT: $vgpr0 = COPY [[AMDGPU_CLAMP]](s32)
%0:vgpr(s32) = COPY $vgpr0
%2:sgpr(s32) = G_FCONSTANT float 2.000000e+00
%8:vgpr(s32) = COPY %2(s32)
@@ -170,6 +214,16 @@ body: |
; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[C1]](s32)
; CHECK-NEXT: [[AMDGPU_FMED3_:%[0-9]+]]:vgpr(s32) = G_AMDGPU_FMED3 [[FMUL]], [[COPY2]], [[COPY3]]
; CHECK-NEXT: $vgpr0 = COPY [[AMDGPU_FMED3_]](s32)
+ ;
+ ; GFX12-LABEL: name: test_fmed3_f32_maybe_NaN_ieee_false
+ ; GFX12: liveins: $vgpr0
+ ; GFX12-NEXT: {{ $}}
+ ; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
+ ; GFX12-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 2.000000e+00
+ ; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+ ; GFX12-NEXT: [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY]], [[COPY1]]
+ ; GFX12-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMUL]]
+ ; GFX12-NEXT: $vgpr0 = COPY [[AMDGPU_CLAMP]](s32)
%0:vgpr(s32) = COPY $vgpr0
%2:sgpr(s32) = G_FCONSTANT float 2.000000e+00
%8:vgpr(s32) = COPY %2(s32)
@@ -209,6 +263,17 @@ body: |
; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[C1]](s32)
; CHECK-NEXT: [[AMDGPU_FMED3_:%[0-9]+]]:vgpr(s32) = G_AMDGPU_FMED3 [[FMINNUM_IEEE]], [[COPY2]], [[COPY3]]
; CHECK-NEXT: $vgpr0 = COPY [[AMDGPU_FMED3_]](s32)
+ ;
+ ; GFX12-LABEL: name: test_fmed3_non_SNaN_input_ieee_true_dx10clamp_false
+ ; GFX12: liveins: $vgpr0
+ ; GFX12-NEXT: {{ $}}
+ ; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
+ ; GFX12-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 1.000000e+01
+ ; GFX12-NEXT: [[FCANONICALIZE:%[0-9]+]]:vgpr(s32) = G_FCANONICALIZE [[COPY]]
+ ; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+ ; GFX12-NEXT: [[FMINNUM_IEEE:%[0-9]+]]:vgpr(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[COPY1]]
+ ; GFX12-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMINNUM_IEEE]]
+ ; GFX12-NEXT: $vgpr0 = COPY [[AMDGPU_CLAMP]](s32)
%0:vgpr(s32) = COPY $vgpr0
%2:sgpr(s32) = G_FCONSTANT float 1.000000e+01
%8:vgpr(s32) = G_FCANONICALIZE %0
@@ -244,6 +309,16 @@ body: |
; CHECK-NEXT: [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY]], [[COPY1]]
; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMUL]]
; CHECK-NEXT: $vgpr0 = COPY [[AMDGPU_CLAMP]](s32)
+ ;
+ ; GFX12-LABEL: name: test_fmed3_maybe_SNaN_input_ieee_true_dx10clamp_true
+ ; GFX12: liveins: $vgpr0
+ ; GFX12-NEXT: {{ $}}
+ ; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
+ ; GFX12-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 2.000000e+00
+ ; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+ ; GFX12-NEXT: [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY]], [[COPY1]]
+ ; GFX12-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMUL]]
+ ; GFX12-NEXT: $vgpr0 = COPY [[AMDGPU_CLAMP]](s32)
%0:vgpr(s32) = COPY $vgpr0
%2:sgpr(s32) = G_FCONSTANT float 2.000000e+00
%8:vgpr(s32) = COPY %2(s32)
diff --git a/llvm/test/CodeGen/AMDGPU/amdpal-msgpack-ieee.ll b/llvm/test/CodeGen/AMDGPU/amdpal-msgpack-ieee.ll
index 4da9080e2c69b8..95d533544c3082 100644
--- a/llvm/test/CodeGen/AMDGPU/amdpal-msgpack-ieee.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdpal-msgpack-ieee.ll
@@ -1,11 +1,13 @@
; RUN: llc -mtriple=amdgcn--amdpal -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s
; RUN: llc -mtriple=amdgcn--amdpal -mcpu=tonga -verify-machineinstrs < %s | FileCheck -check-prefix=VI %s
; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX9 -enable-var-scope %s
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1200 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX12 -enable-var-scope %s
; amdpal compute shader: check for 0x2e12 (COMPUTE_PGM_RSRC1) in pal metadata
; SI-DAG: 0x2e12 (COMPUTE_PGM_RSRC1): 0xaf0000{{$}}
; VI-DAG: 0x2e12 (COMPUTE_PGM_RSRC1): 0xaf02c0{{$}}
; GFX9-DAG: 0x2e12 (COMPUTE_PGM_RSRC1): 0xaf0000{{$}}
+; GFX12-DAG: 0x2e12 (COMPUTE_PGM_RSRC1): 0x600f0000{{$}}
define amdgpu_cs half @cs_amdpal(half %arg0) #0 {
%add = fadd half %arg0, 1.0
ret half %add
@@ -15,6 +17,7 @@ define amdgpu_cs half @cs_amdpal(half %arg0) #0 {
; SI-DAG: 0x2cca (SPI_SHADER_PGM_RSRC1_ES): 0xaf0000{{$}}
; VI-DAG: 0x2cca (SPI_SHADER_PGM_RSRC1_ES): 0xaf02c0{{$}}
; GFX9-DAG: 0x2cca (SPI_SHADER_PGM_RSRC1_ES): 0xaf0000{{$}}
+; GFX12-DAG: 0x2cca (SPI_SHADER_PGM_RSRC1_ES): 0xf0000{{$}}
define amdgpu_es half @es_amdpal(half %arg0) #0 {
%add = fadd half %arg0, 1.0
ret half %add
@@ -24,6 +27,7 @@ define amdgpu_es half @es_amdpal(half %arg0) #0 {
; SI-DAG: 0x2c8a (SPI_SHADER_PGM_RSRC1_GS): 0xaf0000{{$}}
; VI-DAG: 0x2c8a (SPI_SHADER_PGM_RSRC1_GS): 0xaf02c0{{$}}
; GFX9-DAG: 0x2c8a (SPI_SHADER_PGM_RSRC1_GS): 0xaf0000{{$}}
+; GFX12-DAG: 0x2c8a (SPI_SHADER_PGM_RSRC1_GS): 0xa0f0000{{$}}
define amdgpu_gs half @gs_amdpal(half %arg0) #0 {
%add = fadd half %arg0, 1.0
ret half %add
@@ -33,6 +37,7 @@ define amdgpu_gs half @gs_amdpal(half %arg0) #0 {
; SI-DAG: 0x2d0a (SPI_SHADER_PGM_RSRC1_HS): 0xaf0000{{$}}
; VI-DAG: 0x2d0a (SPI_SHADER_PGM_RSRC1_HS): 0xaf02c0{{$}}
; GFX9-DAG: 0x2d0a (SPI_SHADER_PGM_RSRC1_HS): 0xaf0000{{$}}
+; GFX12-DAG: 0x2d0a (SPI_SHADER_PGM_RSRC1_HS): 0x50f0000{{$}}
define amdgpu_hs half @hs_amdpal(half %arg0) #0 {
%add = fadd half %arg0, 1.0
ret half %add
@@ -42,6 +47,7 @@ define amdgpu_hs half @hs_amdpal(half %arg0) #0 {
; SI-DAG: 0x2d4a (SPI_SHADER_PGM_RSRC1_LS): 0xaf0000{{$}}
; VI-DAG: 0x2d4a (SPI_SHADER_PGM_RSRC1_LS): 0xaf02c0{{$}}
; GFX9-DAG: 0x2d4a (SPI_SHADER_PGM_RSRC1_LS): 0xaf0000{{$}}
+; GFX12-DAG: 0x2d4a (SPI_SHADER_PGM_RSRC1_LS): 0xf0000{{$}}
define amdgpu_ls half @ls_amdpal(half %arg0) #0 {
%add = fadd half %arg0, 1.0
ret half %add
@@ -52,6 +58,7 @@ define amdgpu_ls half @ls_amdpal(half %arg0) #0 {
; SI-DAG: 0x2c0a (SPI_SHADER_PGM_RSRC1_PS): 0xaf0000{{$}}
; VI-DAG: 0x2c0a (SPI_SHADER_PGM_RSRC1_PS): 0xaf02c0{{$}}
; GFX9-DAG: 0x2c0a (SPI_SHADER_PGM_RSRC1_PS): 0xaf0000{{$}}
+; GFX12-DAG: 0x2c0a (SPI_SHADER_PGM_RSRC1_PS): 0x20f0000{{$}}
define amdgpu_ps half @ps_amdpal(half %arg0) #0 {
%add = fadd half %arg0, 1.0
ret half %add
@@ -61,6 +68,7 @@ define amdgpu_ps half @ps_amdpal(half %arg0) #0 {
; SI-DAG: 0x2c4a (SPI_SHADER_PGM_RSRC1_VS): 0xaf0000{{$}}
; VI-DAG: 0x2c4a (SPI_SHADER_PGM_RSRC1_VS): 0xaf02c0{{$}}
; GFX9-DAG: 0x2c4a (SPI_SHADER_PGM_RSRC1_VS): 0xaf0000{{$}}
+; GFX12-DAG: 0x2c4a (SPI_SHADER_PGM_RSRC1_VS): 0x80f0000{{$}}
define amdgpu_vs half @vs_amdpal(half %arg0) #0 {
%add = fadd half %arg0, 1.0
ret half %add
diff --git a/llvm/test/CodeGen/AMDGPU/clamp.ll b/llvm/test/CodeGen/AMDGPU/clamp.ll
index 20b248875487ec..3c0b8f7712e198 100644
--- a/llvm/test/CodeGen/AMDGPU/clamp.ll
+++ b/llvm/test/CodeGen/AMDGPU/clamp.ll
@@ -3,6 +3,7 @@
; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX8 %s
; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX9 %s
; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX11 %s
+; RUN: llc -march=amdgcn -mcpu=gfx1200 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX12 %s
define amdgpu_kernel void @v_clamp_f32(ptr addrspace(1) %out, ptr addrspace(1) %aptr) #0 {
; GFX6-LABEL: v_clamp_f32:
@@ -61,6 +62,19 @@ define amdgpu_kernel void @v_clamp_f32(ptr addrspace(1) %out, ptr addrspace(1) %
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -129,6 +143,19 @@ define amdgpu_kernel void @v_clamp_neg_f32(ptr addrspace(1) %out, ptr addrspace(
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_neg_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, -v1, -v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -198,6 +225,19 @@ define amdgpu_kernel void @v_clamp_negabs_f32(ptr addrspace(1) %out, ptr addrspa
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_negabs_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, -|v1|, -|v1| clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -277,6 +317,21 @@ define amdgpu_kernel void @v_clamp_negzero_f32(ptr addrspace(1) %out, ptr addrsp
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_negzero_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_add_f32_e32 v1, 0.5, v1
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_maxmin_num_f32 v1, v1, 0x80000000, 1.0
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -356,6 +411,21 @@ define amdgpu_kernel void @v_clamp_negzero_maybe_snan_f32(ptr addrspace(1) %out,
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_negzero_maybe_snan_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e32 v1, v1, v1
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_maxmin_num_f32 v1, v1, 0x80000000, 1.0
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -442,6 +512,24 @@ define amdgpu_kernel void @v_clamp_multi_use_max_f32(ptr addrspace(1) %out, ptr
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_multi_use_max_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e32 v1, v1, v1
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX12-NEXT: v_max_num_f32_e32 v1, 0, v1
+; GFX12-NEXT: v_min_num_f32_e32 v2, 1.0, v1
+; GFX12-NEXT: global_store_b32 v0, v2, s[0:1]
+; GFX12-NEXT: global_store_b32 v[0:1], v1, off th:TH_STORE_NT_RT
+; GFX12-NEXT: s_waitcnt_vscnt null, 0x0
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -512,6 +600,19 @@ define amdgpu_kernel void @v_clamp_f16(ptr addrspace(1) %out, ptr addrspace(1) %
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 1, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_u16 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f16_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b16 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr half, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr half, ptr addrspace(1) %out, i32 %tid
@@ -581,6 +682,19 @@ define amdgpu_kernel void @v_clamp_neg_f16(ptr addrspace(1) %out, ptr addrspace(
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_neg_f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 1, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_u16 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f16_e64 v1, -v1, -v1 clamp
+; GFX12-NEXT: global_store_b16 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr half, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr half, ptr addrspace(1) %out, i32 %tid
@@ -651,6 +765,19 @@ define amdgpu_kernel void @v_clamp_negabs_f16(ptr addrspace(1) %out, ptr addrspa
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_negabs_f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 1, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_u16 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f16_e64 v1, -|v1|, -|v1| clamp
+; GFX12-NEXT: global_store_b16 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr half, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr half, ptr addrspace(1) %out, i32 %tid
@@ -722,6 +849,19 @@ define amdgpu_kernel void @v_clamp_f64(ptr addrspace(1) %out, ptr addrspace(1) %
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_f64:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v2, 3, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b64 v[0:1], v2, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_f64 v[0:1], v[0:1], v[0:1] clamp
+; GFX12-NEXT: global_store_b64 v2, v[0:1], s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr double, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr double, ptr addrspace(1) %out, i32 %tid
@@ -790,6 +930,19 @@ define amdgpu_kernel void @v_clamp_neg_f64(ptr addrspace(1) %out, ptr addrspace(
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_neg_f64:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v2, 3, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b64 v[0:1], v2, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_f64 v[0:1], -v[0:1], -v[0:1] clamp
+; GFX12-NEXT: global_store_b64 v2, v[0:1], s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr double, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr double, ptr addrspace(1) %out, i32 %tid
@@ -859,6 +1012,19 @@ define amdgpu_kernel void @v_clamp_negabs_f64(ptr addrspace(1) %out, ptr addrspa
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_negabs_f64:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v2, 3, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b64 v[0:1], v2, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_f64 v[0:1], -|v[0:1]|, -|v[0:1]| clamp
+; GFX12-NEXT: global_store_b64 v2, v[0:1], s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr double, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr double, ptr addrspace(1) %out, i32 %tid
@@ -933,6 +1099,19 @@ define amdgpu_kernel void @v_clamp_med3_aby_negzero_f32(ptr addrspace(1) %out, p
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_aby_negzero_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_med3_num_f32 v1, 0x80000000, 1.0, v1
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -999,6 +1178,19 @@ define amdgpu_kernel void @v_clamp_med3_aby_f32(ptr addrspace(1) %out, ptr addrs
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_aby_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -1065,6 +1257,19 @@ define amdgpu_kernel void @v_clamp_med3_bay_f32(ptr addrspace(1) %out, ptr addrs
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_bay_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -1131,6 +1336,19 @@ define amdgpu_kernel void @v_clamp_med3_yab_f32(ptr addrspace(1) %out, ptr addrs
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_yab_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -1197,6 +1415,19 @@ define amdgpu_kernel void @v_clamp_med3_yba_f32(ptr addrspace(1) %out, ptr addrs
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_yba_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -1263,6 +1494,19 @@ define amdgpu_kernel void @v_clamp_med3_ayb_f32(ptr addrspace(1) %out, ptr addrs
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_ayb_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -1329,6 +1573,19 @@ define amdgpu_kernel void @v_clamp_med3_bya_f32(ptr addrspace(1) %out, ptr addrs
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_bya_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -1381,6 +1638,16 @@ define amdgpu_kernel void @v_clamp_constants_to_one_f32(ptr addrspace(1) %out) #
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_constants_to_one_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
+; GFX12-NEXT: v_dual_mov_b32 v1, 1.0 :: v_dual_lshlrev_b32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 4.0)
@@ -1430,6 +1697,16 @@ define amdgpu_kernel void @v_clamp_constants_to_zero_f32(ptr addrspace(1) %out)
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_constants_to_zero_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
+; GFX12-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float -4.0)
@@ -1480,6 +1757,16 @@ define amdgpu_kernel void @v_clamp_constant_preserve_f32(ptr addrspace(1) %out)
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_constant_preserve_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
+; GFX12-NEXT: v_dual_mov_b32 v1, 0.5 :: v_dual_lshlrev_b32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 0.5)
@@ -1530,6 +1817,16 @@ define amdgpu_kernel void @v_clamp_constant_preserve_denorm_f32(ptr addrspace(1)
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_constant_preserve_denorm_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
+; GFX12-NEXT: v_dual_mov_b32 v1, 0x7fffff :: v_dual_lshlrev_b32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float bitcast (i32 8388607 to float))
@@ -1579,6 +1876,16 @@ define amdgpu_kernel void @v_clamp_constant_qnan_f32(ptr addrspace(1) %out) #0 {
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_constant_qnan_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
+; GFX12-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 0x7FF8000000000000)
@@ -1628,6 +1935,16 @@ define amdgpu_kernel void @v_clamp_constant_snan_f32(ptr addrspace(1) %out) #0 {
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_constant_snan_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
+; GFX12-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float bitcast (i32 2139095041 to float))
@@ -1701,6 +2018,19 @@ define amdgpu_kernel void @v_clamp_f32_no_dx10_clamp(ptr addrspace(1) %out, ptr
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_f32_no_dx10_clamp:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_add_f32_e64 v1, v1, 0.5 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -1770,6 +2100,19 @@ define amdgpu_kernel void @v_clamp_f32_snan_dx10clamp(ptr addrspace(1) %out, ptr
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_f32_snan_dx10clamp:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_add_f32_e64 v1, v1, 0.5 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -1844,6 +2187,19 @@ define amdgpu_kernel void @v_clamp_f32_snan_no_dx10clamp(ptr addrspace(1) %out,
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_f32_snan_no_dx10clamp:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -1917,6 +2273,19 @@ define amdgpu_kernel void @v_clamp_f32_snan_no_dx10clamp_nnan_src(ptr addrspace(
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_f32_snan_no_dx10clamp_nnan_src:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_add_f32_e64 v1, v1, 1.0 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -1986,6 +2355,19 @@ define amdgpu_kernel void @v_clamp_med3_aby_f32_no_dx10_clamp(ptr addrspace(1) %
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_aby_f32_no_dx10_clamp:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -2052,6 +2434,19 @@ define amdgpu_kernel void @v_clamp_med3_bay_f32_no_dx10_clamp(ptr addrspace(1) %
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_bay_f32_no_dx10_clamp:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -2118,6 +2513,19 @@ define amdgpu_kernel void @v_clamp_med3_yab_f32_no_dx10_clamp(ptr addrspace(1) %
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_yab_f32_no_dx10_clamp:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -2184,6 +2592,19 @@ define amdgpu_kernel void @v_clamp_med3_yba_f32_no_dx10_clamp(ptr addrspace(1) %
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_yba_f32_no_dx10_clamp:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -2250,6 +2671,19 @@ define amdgpu_kernel void @v_clamp_med3_ayb_f32_no_dx10_clamp(ptr addrspace(1) %
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_ayb_f32_no_dx10_clamp:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -2316,6 +2750,19 @@ define amdgpu_kernel void @v_clamp_med3_bya_f32_no_dx10_clamp(ptr addrspace(1) %
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_med3_bya_f32_no_dx10_clamp:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_max_num_f32_e64 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
@@ -2368,6 +2815,16 @@ define amdgpu_kernel void @v_clamp_constant_qnan_f32_no_dx10_clamp(ptr addrspace
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_constant_qnan_f32_no_dx10_clamp:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
+; GFX12-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 0x7FF8000000000000)
@@ -2418,6 +2875,16 @@ define amdgpu_kernel void @v_clamp_constant_snan_f32_no_dx10_clamp(ptr addrspace
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_constant_snan_f32_no_dx10_clamp:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
+; GFX12-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%out.gep = getelementptr float, ptr addrspace(1) %out, i32 %tid
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float bitcast (i32 2139095041 to float))
@@ -2490,6 +2957,19 @@ define amdgpu_kernel void @v_clamp_v2f16(ptr addrspace(1) %out, ptr addrspace(1)
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_v2f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr <2 x half>, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr <2 x half>, ptr addrspace(1) %out, i32 %tid
@@ -2577,6 +3057,19 @@ define amdgpu_kernel void @v_clamp_v2f16_undef_elt(ptr addrspace(1) %out, ptr ad
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_v2f16_undef_elt:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr <2 x half>, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr <2 x half>, ptr addrspace(1) %out, i32 %tid
@@ -2663,6 +3156,22 @@ define amdgpu_kernel void @v_clamp_v2f16_not_zero(ptr addrspace(1) %out, ptr add
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_v2f16_not_zero:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, v1
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, 2.0
+; GFX12-NEXT: v_pk_min_num_f16 v1, v1, 1.0 op_sel_hi:[1,0]
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr <2 x half>, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr <2 x half>, ptr addrspace(1) %out, i32 %tid
@@ -2748,6 +3257,22 @@ define amdgpu_kernel void @v_clamp_v2f16_not_one(ptr addrspace(1) %out, ptr addr
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_v2f16_not_one:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, v1
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, 0
+; GFX12-NEXT: v_pk_min_num_f16 v1, v1, 1.0 op_sel:[0,1] op_sel_hi:[1,0]
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr <2 x half>, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr <2 x half>, ptr addrspace(1) %out, i32 %tid
@@ -2825,6 +3350,19 @@ define amdgpu_kernel void @v_clamp_neg_v2f16(ptr addrspace(1) %out, ptr addrspac
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_neg_v2f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, v1 neg_lo:[1,1] neg_hi:[1,1] clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr <2 x half>, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr <2 x half>, ptr addrspace(1) %out, i32 %tid
@@ -2906,6 +3444,21 @@ define amdgpu_kernel void @v_clamp_negabs_v2f16(ptr addrspace(1) %out, ptr addrs
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_negabs_v2f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_and_b32_e32 v1, 0x7fff7fff, v1
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, v1 neg_lo:[1,1] neg_hi:[1,1] clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr <2 x half>, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr <2 x half>, ptr addrspace(1) %out, i32 %tid
@@ -2986,6 +3539,19 @@ define amdgpu_kernel void @v_clamp_neglo_v2f16(ptr addrspace(1) %out, ptr addrsp
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_neglo_v2f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, v1 neg_lo:[1,1] clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr <2 x half>, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr <2 x half>, ptr addrspace(1) %out, i32 %tid
@@ -3065,6 +3631,19 @@ define amdgpu_kernel void @v_clamp_neghi_v2f16(ptr addrspace(1) %out, ptr addrsp
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_neghi_v2f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, v1 neg_hi:[1,1] clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr <2 x half>, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr <2 x half>, ptr addrspace(1) %out, i32 %tid
@@ -3144,6 +3723,19 @@ define amdgpu_kernel void @v_clamp_v2f16_shuffle(ptr addrspace(1) %out, ptr addr
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_v2f16_shuffle:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, v1 op_sel:[1,1] op_sel_hi:[0,0] clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr <2 x half>, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr <2 x half>, ptr addrspace(1) %out, i32 %tid
@@ -3232,6 +3824,19 @@ define amdgpu_kernel void @v_clamp_v2f16_undef_limit_elts0(ptr addrspace(1) %out
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_v2f16_undef_limit_elts0:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr <2 x half>, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr <2 x half>, ptr addrspace(1) %out, i32 %tid
@@ -3319,6 +3924,19 @@ define amdgpu_kernel void @v_clamp_v2f16_undef_limit_elts1(ptr addrspace(1) %out
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_v2f16_undef_limit_elts1:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: global_load_b32 v1, v0, s[2:3]
+; GFX12-NEXT: s_waitcnt vmcnt(0)
+; GFX12-NEXT: v_pk_max_num_f16 v1, v1, v1 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr <2 x half>, ptr addrspace(1) %aptr, i32 %tid
%out.gep = getelementptr <2 x half>, ptr addrspace(1) %out, i32 %tid
@@ -3400,6 +4018,25 @@ define amdgpu_kernel void @v_clamp_diff_source_f32(ptr addrspace(1) %out, ptr ad
; GFX11-NEXT: s_nop 0
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX11-NEXT: s_endpgm
+;
+; GFX12-LABEL: v_clamp_diff_source_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
+; GFX12-NEXT: v_mov_b32_e32 v0, 0
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: s_clause 0x1
+; GFX12-NEXT: s_load_b64 s[4:5], s[2:3], 0x0
+; GFX12-NEXT: s_load_b32 s2, s[2:3], 0x8
+; GFX12-NEXT: s_waitcnt lgkmcnt(0)
+; GFX12-NEXT: s_add_f32 s3, s4, s5
+; GFX12-NEXT: s_add_f32 s2, s4, s2
+; GFX12-NEXT: s_delay_alu instid0(SALU_CYCLE_3) | instskip(NEXT) | instid1(SALU_CYCLE_3)
+; GFX12-NEXT: s_max_num_f32 s2, s3, s2
+; GFX12-NEXT: v_max_num_f32_e64 v1, s2, s2 clamp
+; GFX12-NEXT: global_store_b32 v0, v1, s[0:1] offset:12
+; GFX12-NEXT: s_nop 0
+; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT: s_endpgm
{
%gep1 = getelementptr float, ptr addrspace(1) %aptr, i32 1
%gep2 = getelementptr float, ptr addrspace(1) %aptr, i32 2
diff --git a/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s b/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
new file mode 100644
index 00000000000000..efbcec21f586b9
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
@@ -0,0 +1,294 @@
+// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1200 --amdhsa-code-object-version=4 < %s | FileCheck --check-prefix=ASM %s
+// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1200 --amdhsa-code-object-version=4 -filetype=obj < %s > %t
+// RUN: llvm-readelf -S -r -s %t | FileCheck --check-prefix=READOBJ %s
+// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+
+// READOBJ: Section Headers
+// READOBJ: .text PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
+// READOBJ: .rodata PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000100 {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64
+
+// READOBJ: Relocation section '.rela.rodata' at offset
+// READOBJ: 0000000000000010 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 10
+// READOBJ: 0000000000000050 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 110
+// READOBJ: 0000000000000090 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 210
+// READOBJ: 00000000000000d0 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 310
+
+// READOBJ: Symbol table '.symtab' contains {{[0-9]+}} entries:
+// READOBJ: 0000000000000000 0 FUNC LOCAL PROTECTED 2 minimal
+// READOBJ-NEXT: 0000000000000100 0 FUNC LOCAL PROTECTED 2 complete
+// READOBJ-NEXT: 0000000000000200 0 FUNC LOCAL PROTECTED 2 special_sgpr
+// READOBJ-NEXT: 0000000000000300 0 FUNC LOCAL PROTECTED 2 disabled_user_sgpr
+// READOBJ-NEXT: 0000000000000000 64 OBJECT LOCAL DEFAULT 3 minimal.kd
+// READOBJ-NEXT: 0000000000000040 64 OBJECT LOCAL DEFAULT 3 complete.kd
+// READOBJ-NEXT: 0000000000000080 64 OBJECT LOCAL DEFAULT 3 special_sgpr.kd
+// READOBJ-NEXT: 00000000000000c0 64 OBJECT LOCAL DEFAULT 3 disabled_user_sgpr.kd
+
+// OBJDUMP: Contents of section .rodata
+// Note, relocation for KERNEL_CODE_ENTRY_BYTE_OFFSET is not resolved here.
+// minimal
+// OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000
+// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
+// OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
+// OBJDUMP-NEXT: 0030 00000c60 80000000 00000000 00000000
+// complete
+// OBJDUMP-NEXT: 0040 01000000 01000000 08000000 00000000
+// OBJDUMP-NEXT: 0050 00000000 00000000 00000000 00000000
+// OBJDUMP-NEXT: 0060 00000000 00000000 00000000 00000000
+// OBJDUMP-NEXT: 0070 015021e4 1f0f007f 5e040000 00000000
+// special_sgpr
+// OBJDUMP-NEXT: 0080 00000000 00000000 00000000 00000000
+// OBJDUMP-NEXT: 0090 00000000 00000000 00000000 00000000
+// OBJDUMP-NEXT: 00a0 00000000 00000000 00000000 00000000
+// OBJDUMP-NEXT: 00b0 00000060 80000000 00000000 00000000
+// disabled_user_sgpr
+// OBJDUMP-NEXT: 00c0 00000000 00000000 00000000 00000000
+// OBJDUMP-NEXT: 00d0 00000000 00000000 00000000 00000000
+// OBJDUMP-NEXT: 00e0 00000000 00000000 00000000 00000000
+// OBJDUMP-NEXT: 00f0 00000c60 80000000 00000000 00000000
+
+.text
+// ASM: .text
+
+.amdgcn_target "amdgcn-amd-amdhsa--gfx1200"
+// ASM: .amdgcn_target "amdgcn-amd-amdhsa--gfx1200"
+
+.p2align 8
+.type minimal, at function
+minimal:
+ s_endpgm
+
+.p2align 8
+.type complete, at function
+complete:
+ s_endpgm
+
+.p2align 8
+.type special_sgpr, at function
+special_sgpr:
+ s_endpgm
+
+.p2align 8
+.type disabled_user_sgpr, at function
+disabled_user_sgpr:
+ s_endpgm
+
+.rodata
+// ASM: .rodata
+
+// Test that only specifying required directives is allowed, and that defaulted
+// values are omitted.
+.p2align 6
+.amdhsa_kernel minimal
+ .amdhsa_next_free_vgpr 0
+ .amdhsa_next_free_sgpr 0
+.end_amdhsa_kernel
+
+// ASM: .amdhsa_kernel minimal
+// ASM: .amdhsa_next_free_vgpr 0
+// ASM-NEXT: .amdhsa_next_free_sgpr 0
+// ASM: .end_amdhsa_kernel
+
+// Test that we can specify all available directives with non-default values.
+.p2align 6
+.amdhsa_kernel complete
+ .amdhsa_group_segment_fixed_size 1
+ .amdhsa_private_segment_fixed_size 1
+ .amdhsa_kernarg_size 8
+ .amdhsa_user_sgpr_count 15
+ .amdhsa_user_sgpr_dispatch_ptr 1
+ .amdhsa_user_sgpr_queue_ptr 1
+ .amdhsa_user_sgpr_kernarg_segment_ptr 1
+ .amdhsa_user_sgpr_dispatch_id 1
+ .amdhsa_user_sgpr_private_segment_size 1
+ .amdhsa_wavefront_size32 1
+ .amdhsa_enable_private_segment 1
+ .amdhsa_system_sgpr_workgroup_id_x 0
+ .amdhsa_system_sgpr_workgroup_id_y 1
+ .amdhsa_system_sgpr_workgroup_id_z 1
+ .amdhsa_system_sgpr_workgroup_info 1
+ .amdhsa_system_vgpr_workitem_id 1
+ .amdhsa_next_free_vgpr 9
+ .amdhsa_next_free_sgpr 27
+ .amdhsa_reserve_vcc 0
+ .amdhsa_float_round_mode_32 1
+ .amdhsa_float_round_mode_16_64 1
+ .amdhsa_float_denorm_mode_32 1
+ .amdhsa_float_denorm_mode_16_64 0
+ .amdhsa_fp16_overflow 1
+ .amdhsa_workgroup_processor_mode 1
+ .amdhsa_memory_ordered 1
+ .amdhsa_forward_progress 1
+ .amdhsa_shared_vgpr_count 0
+ .amdhsa_round_robin_scheduling 1
+ .amdhsa_exception_fp_ieee_invalid_op 1
+ .amdhsa_exception_fp_denorm_src 1
+ .amdhsa_exception_fp_ieee_div_zero 1
+ .amdhsa_exception_fp_ieee_overflow 1
+ .amdhsa_exception_fp_ieee_underflow 1
+ .amdhsa_exception_fp_ieee_inexact 1
+ .amdhsa_exception_int_div_zero 1
+.end_amdhsa_kernel
+
+// ASM: .amdhsa_kernel complete
+// ASM-NEXT: .amdhsa_group_segment_fixed_size 1
+// ASM-NEXT: .amdhsa_private_segment_fixed_size 1
+// ASM-NEXT: .amdhsa_kernarg_size 8
+// ASM-NEXT: .amdhsa_user_sgpr_count 15
+// ASM-NEXT: .amdhsa_user_sgpr_dispatch_ptr 1
+// ASM-NEXT: .amdhsa_user_sgpr_queue_ptr 1
+// ASM-NEXT: .amdhsa_user_sgpr_kernarg_segment_ptr 1
+// ASM-NEXT: .amdhsa_user_sgpr_dispatch_id 1
+// ASM-NEXT: .amdhsa_user_sgpr_private_segment_size 1
+// ASM-NEXT: .amdhsa_wavefront_size32 1
+// ASM-NEXT: .amdhsa_enable_private_segment 1
+// ASM-NEXT: .amdhsa_system_sgpr_workgroup_id_x 0
+// ASM-NEXT: .amdhsa_system_sgpr_workgroup_id_y 1
+// ASM-NEXT: .amdhsa_system_sgpr_workgroup_id_z 1
+// ASM-NEXT: .amdhsa_system_sgpr_workgroup_info 1
+// ASM-NEXT: .amdhsa_system_vgpr_workitem_id 1
+// ASM-NEXT: .amdhsa_next_free_vgpr 9
+// ASM-NEXT: .amdhsa_next_free_sgpr 27
+// ASM-NEXT: .amdhsa_reserve_vcc 0
+// ASM-NEXT: .amdhsa_float_round_mode_32 1
+// ASM-NEXT: .amdhsa_float_round_mode_16_64 1
+// ASM-NEXT: .amdhsa_float_denorm_mode_32 1
+// ASM-NEXT: .amdhsa_float_denorm_mode_16_64 0
+// ASM-NEXT: .amdhsa_fp16_overflow 1
+// ASM-NEXT: .amdhsa_workgroup_processor_mode 1
+// ASM-NEXT: .amdhsa_memory_ordered 1
+// ASM-NEXT: .amdhsa_forward_progress 1
+// ASM-NEXT: .amdhsa_shared_vgpr_count 0
+// ASM-NEXT: .amdhsa_round_robin_scheduling 1
+// ASM-NEXT: .amdhsa_exception_fp_ieee_invalid_op 1
+// ASM-NEXT: .amdhsa_exception_fp_denorm_src 1
+// ASM-NEXT: .amdhsa_exception_fp_ieee_div_zero 1
+// ASM-NEXT: .amdhsa_exception_fp_ieee_overflow 1
+// ASM-NEXT: .amdhsa_exception_fp_ieee_underflow 1
+// ASM-NEXT: .amdhsa_exception_fp_ieee_inexact 1
+// ASM-NEXT: .amdhsa_exception_int_div_zero 1
+// ASM-NEXT: .end_amdhsa_kernel
+
+// Test that we are including special SGPR usage in the granulated count.
+.p2align 6
+.amdhsa_kernel special_sgpr
+ .amdhsa_next_free_sgpr 27
+
+ .amdhsa_reserve_vcc 0
+
+ .amdhsa_float_denorm_mode_16_64 0
+ .amdhsa_next_free_vgpr 0
+.end_amdhsa_kernel
+
+// ASM: .amdhsa_kernel special_sgpr
+// ASM: .amdhsa_next_free_vgpr 0
+// ASM-NEXT: .amdhsa_next_free_sgpr 27
+// ASM-NEXT: .amdhsa_reserve_vcc 0
+// ASM: .amdhsa_float_denorm_mode_16_64 0
+// ASM: .end_amdhsa_kernel
+
+// Test that explicitly disabling user_sgpr's does not affect the user_sgpr
+// count, i.e. this should produce the same descriptor as minimal.
+.p2align 6
+.amdhsa_kernel disabled_user_sgpr
+ .amdhsa_next_free_vgpr 0
+ .amdhsa_next_free_sgpr 0
+.end_amdhsa_kernel
+
+// ASM: .amdhsa_kernel disabled_user_sgpr
+// ASM: .amdhsa_next_free_vgpr 0
+// ASM-NEXT: .amdhsa_next_free_sgpr 0
+// ASM: .end_amdhsa_kernel
+
+.section .foo
+
+.byte .amdgcn.gfx_generation_number
+// ASM: .byte 12
+
+.byte .amdgcn.gfx_generation_minor
+// ASM: .byte 0
+
+.byte .amdgcn.gfx_generation_stepping
+// ASM: .byte 0
+
+.byte .amdgcn.next_free_vgpr
+// ASM: .byte 0
+.byte .amdgcn.next_free_sgpr
+// ASM: .byte 0
+
+v_mov_b32_e32 v7, s10
+
+.byte .amdgcn.next_free_vgpr
+// ASM: .byte 8
+.byte .amdgcn.next_free_sgpr
+// ASM: .byte 11
+
+.set .amdgcn.next_free_vgpr, 0
+.set .amdgcn.next_free_sgpr, 0
+
+.byte .amdgcn.next_free_vgpr
+// ASM: .byte 0
+.byte .amdgcn.next_free_sgpr
+// ASM: .byte 0
+
+v_mov_b32_e32 v16, s3
+
+.byte .amdgcn.next_free_vgpr
+// ASM: .byte 17
+.byte .amdgcn.next_free_sgpr
+// ASM: .byte 4
+
+// Metadata
+
+.amdgpu_metadata
+ amdhsa.version:
+ - 3
+ - 0
+ amdhsa.kernels:
+ - .name: amd_kernel_code_t_test_all
+ .symbol: amd_kernel_code_t_test_all at kd
+ .kernarg_segment_size: 8
+ .group_segment_fixed_size: 16
+ .private_segment_fixed_size: 32
+ .kernarg_segment_align: 64
+ .wavefront_size: 128
+ .sgpr_count: 14
+ .vgpr_count: 40
+ .max_flat_workgroup_size: 256
+ - .name: amd_kernel_code_t_minimal
+ .symbol: amd_kernel_code_t_minimal at kd
+ .kernarg_segment_size: 8
+ .group_segment_fixed_size: 16
+ .private_segment_fixed_size: 32
+ .kernarg_segment_align: 64
+ .wavefront_size: 128
+ .sgpr_count: 14
+ .vgpr_count: 40
+ .max_flat_workgroup_size: 256
+.end_amdgpu_metadata
+
+// ASM: .amdgpu_metadata
+// ASM: amdhsa.kernels:
+// ASM: - .group_segment_fixed_size: 16
+// ASM: .kernarg_segment_align: 64
+// ASM: .kernarg_segment_size: 8
+// ASM: .max_flat_workgroup_size: 256
+// ASM: .name: amd_kernel_code_t_test_all
+// ASM: .private_segment_fixed_size: 32
+// ASM: .sgpr_count: 14
+// ASM: .symbol: 'amd_kernel_code_t_test_all at kd'
+// ASM: .vgpr_count: 40
+// ASM: .wavefront_size: 128
+// ASM: - .group_segment_fixed_size: 16
+// ASM: .kernarg_segment_align: 64
+// ASM: .kernarg_segment_size: 8
+// ASM: .max_flat_workgroup_size: 256
+// ASM: .name: amd_kernel_code_t_minimal
+// ASM: .private_segment_fixed_size: 32
+// ASM: .sgpr_count: 14
+// ASM: .symbol: 'amd_kernel_code_t_minimal at kd'
+// ASM: .vgpr_count: 40
+// ASM: .wavefront_size: 128
+// ASM: amdhsa.version:
+// ASM-NEXT: - 3
+// ASM-NEXT: - 0
+// ASM: .end_amdgpu_metadata
>From df1ca8a16f907d0c4e3c5f5b6e13acf7636376a2 Mon Sep 17 00:00:00 2001
From: Piotr Sobczak <piotr.sobczak at amd.com>
Date: Mon, 11 Dec 2023 09:34:25 +0100
Subject: [PATCH 2/4] Clang formatter/docs fixups
---
llvm/docs/AMDGPUUsage.rst | 2 +-
llvm/include/llvm/Support/AMDHSAKernelDescriptor.h | 8 ++++----
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp | 12 ++++++------
.../AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp | 9 +++------
llvm/lib/Target/AMDGPU/SIDefines.h | 6 +++---
llvm/lib/Target/AMDGPU/SIProgramInfo.cpp | 2 +-
llvm/lib/Target/AMDGPU/SIProgramInfo.h | 2 +-
7 files changed, 19 insertions(+), 22 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index c7327623493e2d..5fcf651046943f 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -15494,7 +15494,7 @@ terminated by an ``.end_amdhsa_kernel`` directive.
``.amdhsa_forward_progress`` 0 GFX10-GFX12 Controls FWD_PROGRESS in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
``.amdhsa_shared_vgpr_count`` 0 GFX10-GFX11 Controls SHARED_VGPR_COUNT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx12-table`.
``.amdhsa_exception_fp_ieee_invalid_op`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
``.amdhsa_exception_fp_denorm_src`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in
diff --git a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
index ba650681824171..2de2cf4185d86e 100644
--- a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
+++ b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
@@ -89,8 +89,8 @@ enum : uint8_t {
#define COMPUTE_PGM_RSRC1_GFX6_GFX9(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX6_GFX9_ ## NAME, SHIFT, WIDTH)
// [GFX6-GFX11].
-#define COMPUTE_PGM_RSRC1_GFX6_GFX11(NAME, SHIFT, WIDTH) \
- AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX6_GFX11_ ## NAME, SHIFT, WIDTH)
+#define COMPUTE_PGM_RSRC1_GFX6_GFX11(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX6_GFX11_##NAME, SHIFT, WIDTH)
// GFX9+.
#define COMPUTE_PGM_RSRC1_GFX9_PLUS(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX9_PLUS_ ## NAME, SHIFT, WIDTH)
@@ -98,8 +98,8 @@ enum : uint8_t {
#define COMPUTE_PGM_RSRC1_GFX10_PLUS(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX10_PLUS_ ## NAME, SHIFT, WIDTH)
// GFX12+.
-#define COMPUTE_PGM_RSRC1_GFX12_PLUS(NAME, SHIFT, WIDTH) \
- AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX12_PLUS_ ## NAME, SHIFT, WIDTH)
+#define COMPUTE_PGM_RSRC1_GFX12_PLUS(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_GFX12_PLUS_##NAME, SHIFT, WIDTH)
enum : int32_t {
COMPUTE_PGM_RSRC1(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6),
COMPUTE_PGM_RSRC1(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4),
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index d2015c615846ca..7aa509124dcb46 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -5337,14 +5337,14 @@ bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() {
if (IVersion.Major >= 12)
return Error(IDRange.Start, "directive unsupported on gfx12+", IDRange);
PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1,
- COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_DX10_CLAMP,
- Val, ValRange);
+ COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_DX10_CLAMP, Val,
+ ValRange);
} else if (ID == ".amdhsa_ieee_mode") {
if (IVersion.Major >= 12)
return Error(IDRange.Start, "directive unsupported on gfx12+", IDRange);
PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1,
- COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_IEEE_MODE,
- Val, ValRange);
+ COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_IEEE_MODE, Val,
+ ValRange);
} else if (ID == ".amdhsa_fp16_overflow") {
if (IVersion.Major < 9)
return Error(IDRange.Start, "directive requires gfx9+", IDRange);
@@ -5411,8 +5411,8 @@ bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() {
if (IVersion.Major < 12)
return Error(IDRange.Start, "directive requires gfx12+", IDRange);
PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1,
- COMPUTE_PGM_RSRC1_GFX12_PLUS_ENABLE_WG_RR_EN,
- Val, ValRange);
+ COMPUTE_PGM_RSRC1_GFX12_PLUS_ENABLE_WG_RR_EN, Val,
+ ValRange);
} else {
return Error(IDRange.Start, "unknown .amdhsa_kernel directive", IDRange);
}
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index 693127c3e89709..a855cf585205bc 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -452,11 +452,9 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
compute_pgm_rsrc1,
amdhsa::COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_16_64);
if (IVersion.Major < 12) {
- PRINT_FIELD(OS, ".amdhsa_dx10_clamp", KD,
- compute_pgm_rsrc1,
+ PRINT_FIELD(OS, ".amdhsa_dx10_clamp", KD, compute_pgm_rsrc1,
amdhsa::COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_DX10_CLAMP);
- PRINT_FIELD(OS, ".amdhsa_ieee_mode", KD,
- compute_pgm_rsrc1,
+ PRINT_FIELD(OS, ".amdhsa_ieee_mode", KD, compute_pgm_rsrc1,
amdhsa::COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_IEEE_MODE);
}
if (IVersion.Major >= 9)
@@ -481,8 +479,7 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
amdhsa::COMPUTE_PGM_RSRC3_GFX10_PLUS_SHARED_VGPR_COUNT);
}
if (IVersion.Major >= 12)
- PRINT_FIELD(OS, ".amdhsa_round_robin_scheduling", KD,
- compute_pgm_rsrc1,
+ PRINT_FIELD(OS, ".amdhsa_round_robin_scheduling", KD, compute_pgm_rsrc1,
amdhsa::COMPUTE_PGM_RSRC1_GFX12_PLUS_ENABLE_WG_RR_EN);
PRINT_FIELD(
OS, ".amdhsa_exception_fp_ieee_invalid_op", KD,
diff --git a/llvm/lib/Target/AMDGPU/SIDefines.h b/llvm/lib/Target/AMDGPU/SIDefines.h
index a63db6c22e0698..8878313aec57cb 100644
--- a/llvm/lib/Target/AMDGPU/SIDefines.h
+++ b/llvm/lib/Target/AMDGPU/SIDefines.h
@@ -1117,9 +1117,9 @@ enum Register_Flag : uint8_t {
#define S_00B848_DX10_CLAMP(x) (((x) & 0x1) << 21)
#define G_00B848_DX10_CLAMP(x) (((x) >> 21) & 0x1)
#define C_00B848_DX10_CLAMP 0xFFDFFFFF
-#define S_00B848_RR_WG_MODE(x) (((x) & 0x1) << 21)
-#define G_00B848_RR_WG_MODE(x) (((x) >> 21) & 0x1)
-#define C_00B848_RR_WG_MODE 0xFFDFFFFF
+#define S_00B848_RR_WG_MODE(x) (((x)&0x1) << 21)
+#define G_00B848_RR_WG_MODE(x) (((x) >> 21) & 0x1)
+#define C_00B848_RR_WG_MODE 0xFFDFFFFF
#define S_00B848_DEBUG_MODE(x) (((x) & 0x1) << 22)
#define G_00B848_DEBUG_MODE(x) (((x) >> 22) & 0x1)
#define C_00B848_DEBUG_MODE 0xFFBFFFFF
diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
index e0763ba9997def..9ed7aacc0538ec 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
@@ -14,8 +14,8 @@
//===----------------------------------------------------------------------===//
//
-#include "GCNSubtarget.h"
#include "SIProgramInfo.h"
+#include "GCNSubtarget.h"
#include "SIDefines.h"
#include "Utils/AMDGPUBaseInfo.h"
diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.h b/llvm/lib/Target/AMDGPU/SIProgramInfo.h
index a178a75c448a89..8c26789f936cff 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.h
@@ -36,7 +36,7 @@ struct SIProgramInfo {
uint32_t IEEEMode = 0;
uint32_t WgpMode = 0; // GFX10+
uint32_t MemOrdered = 0; // GFX10+
- uint32_t RrWgMode = 0; // GFX12+
+ uint32_t RrWgMode = 0; // GFX12+
uint64_t ScratchSize = 0;
// State used to calculate fields set in PGM_RSRC2 pm4 packet.
>From ca1ecebc39cbd99c80c4f0df06373e8d1e573211 Mon Sep 17 00:00:00 2001
From: Piotr Sobczak <piotr.sobczak at amd.com>
Date: Mon, 11 Dec 2023 09:59:19 +0100
Subject: [PATCH 3/4] Clang formatter fixup
---
llvm/lib/Target/AMDGPU/SIDefines.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/lib/Target/AMDGPU/SIDefines.h b/llvm/lib/Target/AMDGPU/SIDefines.h
index 8878313aec57cb..bc629b855c5cff 100644
--- a/llvm/lib/Target/AMDGPU/SIDefines.h
+++ b/llvm/lib/Target/AMDGPU/SIDefines.h
@@ -1117,7 +1117,7 @@ enum Register_Flag : uint8_t {
#define S_00B848_DX10_CLAMP(x) (((x) & 0x1) << 21)
#define G_00B848_DX10_CLAMP(x) (((x) >> 21) & 0x1)
#define C_00B848_DX10_CLAMP 0xFFDFFFFF
-#define S_00B848_RR_WG_MODE(x) (((x)&0x1) << 21)
+#define S_00B848_RR_WG_MODE(x) (((x) & 0x1) << 21)
#define G_00B848_RR_WG_MODE(x) (((x) >> 21) & 0x1)
#define C_00B848_RR_WG_MODE 0xFFDFFFFF
#define S_00B848_DEBUG_MODE(x) (((x) & 0x1) << 22)
>From d25d2028958723d0881191803e57df3853bd8507 Mon Sep 17 00:00:00 2001
From: Piotr Sobczak <piotr.sobczak at amd.com>
Date: Wed, 13 Dec 2023 12:16:12 +0100
Subject: [PATCH 4/4] Formatting fixup
---
llvm/lib/Target/AMDGPU/SIDefines.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIDefines.h b/llvm/lib/Target/AMDGPU/SIDefines.h
index a3332f7175502a..42a5d14155f4f2 100644
--- a/llvm/lib/Target/AMDGPU/SIDefines.h
+++ b/llvm/lib/Target/AMDGPU/SIDefines.h
@@ -1120,9 +1120,9 @@ enum Register_Flag : uint8_t {
#define S_00B848_DX10_CLAMP(x) (((x) & 0x1) << 21)
#define G_00B848_DX10_CLAMP(x) (((x) >> 21) & 0x1)
#define C_00B848_DX10_CLAMP 0xFFDFFFFF
-#define S_00B848_RR_WG_MODE(x) (((x) & 0x1) << 21)
-#define G_00B848_RR_WG_MODE(x) (((x) >> 21) & 0x1)
-#define C_00B848_RR_WG_MODE 0xFFDFFFFF
+#define S_00B848_RR_WG_MODE(x) (((x) & 0x1) << 21)
+#define G_00B848_RR_WG_MODE(x) (((x) >> 21) & 0x1)
+#define C_00B848_RR_WG_MODE 0xFFDFFFFF
#define S_00B848_DEBUG_MODE(x) (((x) & 0x1) << 22)
#define G_00B848_DEBUG_MODE(x) (((x) >> 22) & 0x1)
#define C_00B848_DEBUG_MODE 0xFFBFFFFF
More information about the flang-commits
mailing list