[Mlir-commits] [mlir] [MLIR][NVGPU] Introduce `warpgroup.init.accumulator` Op (PR #67530)

Tue Oct 10 11:06:06 PDT 2023

================
@@ -727,4 +727,15 @@ def NVGPU_WarpgroupMmaOp : NVGPU_Op<"warpgroup.mma"> {
   let hasVerifier = 1;
 }
 
+def NVGPU_WarpgroupMmaInitAccumulatorOp : NVGPU_Op<"warpgroup.mma.init.accumulator"> {  
+  let summary = "Initialize accumulator matrix for `warppgroup.mma`";
+
+  let description = [{
+    This Op generates and initilizes the accumulator matrix for 
+    `nvgpu.warpgroup.mma` op to perform matrix-multiply-and-accumulate (mma).
+  }];
+  let results = (outs Variadic<NVGPU_WarpgroupAccumulator>:$matrixC);
----------------
qcolombet wrote:

> I plan to simplify the `nvgpu.wargroup.accumulator` type by allowing larger types `<128x128>` instead of 2x`<64x128>`. The transformation will handle the creation of multiple structs if necessary. This change will enable me to manage everything with a single Op and eliminate the need for varidic in these three ops.
> 
> Here's how the updated IR will look (instead of option 1 above):
> 
> ```
> // Init
> %matrixC1, %matrixC2 = nvgpu.wargroup.mma.init.accumulator ->  
>                     !nvgpu.wargroup.accumulator<fragmented = vector<128x128xf32>>
> 
> // GEMM
> %matrixD = nvgpu.wargroup.mma %descA, %descB, %matrixC ...
> 
> // Epilogue 
> nvgpu.wargroup.mma.store [%matrixD1, %matrixD2] to %sharedMemoryBuffer
>   : !nvgpu.wargroup.accumulator<fragmented = vector<128x128xf32>>
>     into memref<128x128xf32,3>
> ```
> 
> I believe it's a good idea to proceed with this PR first, and then I will put the simplification PR later.

Works for me

https://github.com/llvm/llvm-project/pull/67530