[Mlir-commits] [mlir] [mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (PR #104851)

Tue Aug 20 09:41:22 PDT 2024

================
@@ -1198,21 +1198,31 @@ def GPU_SubgroupReduceOp : GPU_Op<"subgroup_reduce", [SameOperandsAndResultType]
   let summary = "Reduce values among subgroup.";
   let description = [{
     The `subgroup_reduce` op reduces the value of every lane (work item) across
-    a subgroup. The result is equal for all lanes.
+    a subgroup.
 
     When the reduced value is of a vector type, each vector element is reduced
     independently. Only 1-d vector types are allowed.
 
     Example:
 
     ```mlir
-    %1 = gpu.subgroup_reduce add %a : (f32) -> (f32)
-    %2 = gpu.subgroup_reduce add %b : (vector<4xf16>) -> (vector<4xf16>)
+    %1 = gpu.subgroup_reduce add %a : (f32) -> f32
+    %2 = gpu.subgroup_reduce add %b : (vector<4xf16>) -> vector<4xf16>
+    %3 = gpu.subgroup_reduce add %c cluster_size(4) : (f32) -> f32
     ```
 
     If `uniform` flag is set either none or all lanes of a subgroup need to execute
-    this op in convergence. The reduction operation must be one
-    of:
+    this op in convergence.
+
+    If a `cluster_size` is not provided, the reduction covers all lanes of the
+    subgroup and the result is equal for all lanes.
+
+    If a `cluster_size` is provided, the subgroup is divided into clusters of
+    `cluster_size` contiguous lanes each, a reduction is done for all lanes of
+    each cluster (in parallel), and the result is equal for all lanes in a
+    cluster.
----------------
andfau-amd wrote:

Proposed alternative wording:

> The subgroup is divided into clusters of `cluster_size` contiguous lanes each, and a reduction is done for all lanes of each cluster (in parallel). The result is equal for all lanes in a cluster. When `cluster_size` is omitted, there is a single cluster with size of the subgroup.

https://github.com/llvm/llvm-project/pull/104851