[Mlir-commits] [mlir] [mlir][gpu] Add GPU subgroup MMA extract and insert operations (PR #139048)

Wed May 21 00:10:12 PDT 2025

================
@@ -1919,6 +1919,79 @@ def GPU_SubgroupMmaConstantMatrixOp : GPU_Op<"subgroup_mma_constant_matrix",
   }];
 }
 
+def GPU_SubgroupMmaExtractOp : GPU_Op<"subgroup_mma_extract",
+    [Pure,
+     TypesMatchWith<"value type matches element type of mma_matrix",
+                    "matrix", "res",
+                    "::llvm::cast<gpu::MMAMatrixType>($_self).getElementType()">]>{
+
+  let summary = "Extract a value from GPU warp by invocation and indices";
+
+  let description = [{
+    The `gpu.subgroup_mma_extract` operation extracts a value from `!gpu.mma_matrix`
+    by the invocation in a subgroup.
+
+    This operation takes `!gpu.mma_matrix` as its first operand. It is the source
+    matrix across a subgroup. The op returns a scalar value stored in the invocation
+    in the subgroup. If there are multiple values packed in an invocation, use
+    `indices` to specify the element to extract.
+
+    Example:
+
+    ```mlir
+    %c0 = arith.constant 0 : index
+    %val = gpu.subgroup_mma_extract %m[%c0] : !gpu.mma_matrix<16x16xf32, "AOp"> -> f32
----------------
Hsiangkai wrote:

For cooperative matrix, values are stored across multiple threads. Take 4x4 matrix as an example, it spreads the values to 16 threads and every thread has one value. The index will be zero in a specific thread. However, it is also possible to pack multiple values in a thread. For example, 4x8 matrix may store 2 values per thread with 16 threads in total. In this case, the index may be 0 or 1 in a specific thread.

It follows the syntax of spirv.OpCompositeExtract and spirv.OpCompositeInsert.

https://github.com/llvm/llvm-project/pull/139048