[llvm-branch-commits] [llvm] [AMDGPU] Update documentation for wave reduction intrinsics (PR #175132)
via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Wed Jan 28 01:26:19 PST 2026
https://github.com/easyonaadit updated https://github.com/llvm/llvm-project/pull/175132
>From 14d40610272bf26f3cf3c3b15b314e99e201c059 Mon Sep 17 00:00:00 2001
From: Aaditya <Aaditya.AlokDeshpande at amd.com>
Date: Fri, 9 Jan 2026 12:05:04 +0530
Subject: [PATCH 1/2] [AMDGPU] Update documentation for wave reduction
intrinsics
---
llvm/docs/AMDGPUUsage.rst | 74 ++++++++++++++++++++++++++++++++++++---
1 file changed, 70 insertions(+), 4 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 39280a37e8d30..c46018bdaa491 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1378,9 +1378,19 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
0: Target default preference,
1: `Iterative strategy`, and
2: `DPP`.
- If target does not support the DPP operations (e.g. gfx6/7),
+ If the target does not support the DPP operations (e.g. gfx6/7),
reduction will be performed using default iterative strategy.
- Intrinsic is currently only implemented for i32.
+ Intrinsic is implemented for i32 and i64 types.
+
+ llvm.amdgcn.wave.reduce.min Similar to `llvm.amdgcn.wave.reduce.umin`, but performs a signed min
+ reduction on signed integers.
+ Intrinsic is implemented for i32 and i64 types.
+
+ llvm.amdgcn.wave.reduce.fmin Similar to `llvm.amdgcn.wave.reduce.umin`, but performs a floating point min
+ reduction on floating point values.
+ Intrinsic is implemented for float and double types.
+ NAN values are not canonnicalized.
+ The ordering behaviour of SNANs is non-deterministic.
llvm.amdgcn.wave.reduce.umax Performs an arithmetic unsigned max reduction on the unsigned values
provided by each lane in the wavefront.
@@ -1388,9 +1398,65 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
0: Target default preference,
1: `Iterative strategy`, and
2: `DPP`.
- If target does not support the DPP operations (e.g. gfx6/7),
+ If the target does not support the DPP operations (e.g. gfx6/7),
reduction will be performed using default iterative strategy.
- Intrinsic is currently only implemented for i32.
+ Intrinsic is implemented for i32 and i64 types.
+
+ llvm.amdgcn.wave.reduce.max Similar to `llvm.amdgcn.wave.reduce.umax`, but performs a signed max
+ reduction on signed integers.
+ Intrinsic is implemented for i32 and i64 types.
+
+ llvm.amdgcn.wave.reduce.fmax Similar to `llvm.amdgcn.wave.reduce.umax`, but performs a floating point max
+ reduction on floating point values.
+ Intrinsic is implemented for float and double types.
+ NAN values are not canonnicalized.
+ The ordering behaviour of SNANs is non-deterministic.
+
+ llvm.amdgcn.wave.reduce.add Performs an arithmetic add reduction on the signed/unsigned values
+ provided by each lane in the wavefront.
+ Intrinsic takes a hint for reduction strategy using second operand
+ 0: Target default preference,
+ 1: `Iterative strategy`, and
+ 2: `DPP`.
+ If the target does not support the DPP operations (e.g. gfx6/7),
+ reduction will be performed using default iterative strategy.
+ Intrinsic is implemented for signed/unsigned i32 and i64 types.
+
+ llvm.amdgcn.wave.reduce.fadd Similar to `llvm.amdgcn.wave.reduce.add`, but performs a floating point add
+ reduction on floating point values.
+ Intrinsic is implemented for float and double types.
+
+ llvm.amdgcn.wave.reduce.sub Performs an arithmetic sub reduction on the signed/unsigned values
+ provided by each lane in the wavefront.
+ Intrinsic takes a hint for reduction strategy using second operand
+ 0: Target default preference,
+ 1: `Iterative strategy`, and
+ 2: `DPP`.
+ If the target does not support the DPP operations (e.g. gfx6/7),
+ reduction will be performed using default iterative strategy.
+ Intrinsic is implemented for signed/unsigned i32 and i64 types.
+
+ llvm.amdgcn.wave.reduce.fsub Similar to `llvm.amdgcn.wave.reduce.sub`, but performs a floating point sub
+ reduction on floating point values.
+ Intrinsic is implemented for float and double types.
+
+ llvm.amdgcn.wave.reduce.and Performs a bitwise-and reduction on the values
+ provided by each lane in the wavefront.
+ Intrinsic takes a hint for reduction strategy using second operand
+ 0: Target default preference,
+ 1: `Iterative strategy`, and
+ 2: `DPP`.
+ If the target does not support the DPP operations (e.g. gfx6/7),
+ reduction will be performed using default iterative strategy.
+ Intrinsic is implemented for i32 and i64 types.
+
+ llvm.amdgcn.wave.reduce.or Similar to `llvm.amdgcn.wave.reduce.and`, but performs a bitwise-or
+ reduction on the values provided by each wavefront.
+ Intrinsic is implemented for i32 and i64 types.
+
+ llvm.amdgcn.wave.reduce.xor Similar to `llvm.amdgcn.wave.reduce.and`, but performs a bitwise-xor
+ reduction on the values provided by each wavefront.
+ Intrinsic is implemented for i32 and i64 types.
llvm.amdgcn.permlane16 Provides direct access to v_permlane16_b32. Performs arbitrary gather-style
operation within a row (16 contiguous lanes) of the second input operand.
>From 6f90a85d166b74dd388e69c3c4cdd3aa301cba28 Mon Sep 17 00:00:00 2001
From: Aaditya <Aaditya.AlokDeshpande at amd.com>
Date: Wed, 28 Jan 2026 12:05:12 +0530
Subject: [PATCH 2/2] Modelled fmin/fmax similar to llvm.minimumnum/maximumnum
---
llvm/docs/AMDGPUUsage.rst | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index c46018bdaa491..cba7abfc56e57 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1389,7 +1389,10 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
llvm.amdgcn.wave.reduce.fmin Similar to `llvm.amdgcn.wave.reduce.umin`, but performs a floating point min
reduction on floating point values.
Intrinsic is implemented for float and double types.
- NAN values are not canonnicalized.
+ Intrinsic is modelled similar to `llvm.minnum` intrinsic.
+ For a reduction between two NAN values, a NAN is returned.
+ For a reduction between a NAN and a number, the number is returned.
+ -0.0 < +0.0 is true for this reduction.
The ordering behaviour of SNANs is non-deterministic.
llvm.amdgcn.wave.reduce.umax Performs an arithmetic unsigned max reduction on the unsigned values
@@ -1409,7 +1412,10 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
llvm.amdgcn.wave.reduce.fmax Similar to `llvm.amdgcn.wave.reduce.umax`, but performs a floating point max
reduction on floating point values.
Intrinsic is implemented for float and double types.
- NAN values are not canonnicalized.
+ Intrinsic is modelled similar to `llvm.maxnum` intrinsic.
+ For a reduction between two NAN values, a NAN is returned.
+ For a reduction between a NAN and a number, the number is returned.
+ -0.0 < +0.0 is true for this reduction.
The ordering behaviour of SNANs is non-deterministic.
llvm.amdgcn.wave.reduce.add Performs an arithmetic add reduction on the signed/unsigned values
More information about the llvm-branch-commits
mailing list