[Mlir-commits] [mlir] [mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (PR #104851)
Jakub Kuderski
llvmlistbot at llvm.org
Tue Aug 20 09:46:58 PDT 2024
================
@@ -1198,21 +1198,31 @@ def GPU_SubgroupReduceOp : GPU_Op<"subgroup_reduce", [SameOperandsAndResultType]
let summary = "Reduce values among subgroup.";
let description = [{
The `subgroup_reduce` op reduces the value of every lane (work item) across
- a subgroup. The result is equal for all lanes.
+ a subgroup.
When the reduced value is of a vector type, each vector element is reduced
independently. Only 1-d vector types are allowed.
Example:
```mlir
- %1 = gpu.subgroup_reduce add %a : (f32) -> (f32)
- %2 = gpu.subgroup_reduce add %b : (vector<4xf16>) -> (vector<4xf16>)
+ %1 = gpu.subgroup_reduce add %a : (f32) -> f32
+ %2 = gpu.subgroup_reduce add %b : (vector<4xf16>) -> vector<4xf16>
+ %3 = gpu.subgroup_reduce add %c cluster_size(4) : (f32) -> f32
```
If `uniform` flag is set either none or all lanes of a subgroup need to execute
- this op in convergence. The reduction operation must be one
- of:
+ this op in convergence.
+
+ If a `cluster_size` is not provided, the reduction covers all lanes of the
+ subgroup and the result is equal for all lanes.
+
+ If a `cluster_size` is provided, the subgroup is divided into clusters of
+ `cluster_size` contiguous lanes each, a reduction is done for all lanes of
+ each cluster (in parallel), and the result is equal for all lanes in a
+ cluster.
----------------
kuhar wrote:
I think it's fine either way. My issue with the explicit wording is that the subgroup_reduce op doesn't know the subgroup size, so it's a bit awkward to talk about it in the semantics.
https://github.com/llvm/llvm-project/pull/104851
More information about the Mlir-commits
mailing list