[Mlir-commits] [mlir] [mlir][gpu] Add GPU subgroup MMA extract and insert operations (PR #139048)
Krzysztof Drewniak
llvmlistbot at llvm.org
Thu May 22 09:18:42 PDT 2025
================
@@ -1919,6 +1919,81 @@ def GPU_SubgroupMmaConstantMatrixOp : GPU_Op<"subgroup_mma_constant_matrix",
}];
}
+def GPU_SubgroupMmaExtractOp : GPU_Op<"subgroup_mma_extract",
+ [Pure,
+ TypesMatchWith<"value type matches element type of mma_matrix",
+ "matrix", "res",
+ "::llvm::cast<gpu::MMAMatrixType>($_self).getElementType()">]>{
+
+ let summary = "Extract a value from GPU warp by invocation and indices";
+
+ let description = [{
+ The `gpu.subgroup_mma_extract` operation extracts a value from `!gpu.mma_matrix`
+ by the invocation in a subgroup.
+
+ This operation takes `!gpu.mma_matrix` as its first operand. It is the source
+ matrix across a subgroup. The op returns a scalar value stored in the invocation
+ in the subgroup. The values of !gpu.mma_matrix are stored across multiple
+ threads in the subgroup. If there are multiple values packed in a thread, use
+ `indices` to specify the element in the local thread to extract.
----------------
krzysz00 wrote:
```suggestion
The `gpu.subgroup_mma_extract` operation extracts a value from `!gpu.mma_matrix` that is stored at subgroup level.
Since `matrix` is packed into the the threads within a subgroup, `indices` are the indices into the values stored by each thread. That is, an index of 0 (or [0, 0]) does not necessarily refer to the first element of the matrix, but the first element that a particular thread holds.
The mapping of matrix elements to threads is not defined by this operation and may not be defined by some lowerings (such as the lowering to SPIR-V). However, if the size of the subgroup is S, then `subgroup_mma_extract`ing at each index in `[0, (M * N) / S)` will have the entire matrix extracted across the subgroup.
```
https://github.com/llvm/llvm-project/pull/139048
More information about the Mlir-commits
mailing list