[Mlir-commits] [mlir] [MLIR][NVGPU] Introduce `warpgroup.init.accumulator` Op (PR #67530)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Tue Oct 10 11:06:06 PDT 2023
================
@@ -727,4 +727,15 @@ def NVGPU_WarpgroupMmaOp : NVGPU_Op<"warpgroup.mma"> {
let hasVerifier = 1;
}
+def NVGPU_WarpgroupMmaInitAccumulatorOp : NVGPU_Op<"warpgroup.mma.init.accumulator"> {
+ let summary = "Initialize accumulator matrix for `warppgroup.mma`";
+
+ let description = [{
+ This Op generates and initilizes the accumulator matrix for
+ `nvgpu.warpgroup.mma` op to perform matrix-multiply-and-accumulate (mma).
+ }];
+ let results = (outs Variadic<NVGPU_WarpgroupAccumulator>:$matrixC);
----------------
qcolombet wrote:
> I plan to simplify the `nvgpu.wargroup.accumulator` type by allowing larger types `<128x128>` instead of 2x`<64x128>`. The transformation will handle the creation of multiple structs if necessary. This change will enable me to manage everything with a single Op and eliminate the need for varidic in these three ops.
>
> Here's how the updated IR will look (instead of option 1 above):
>
> ```
> // Init
> %matrixC1, %matrixC2 = nvgpu.wargroup.mma.init.accumulator ->
> !nvgpu.wargroup.accumulator<fragmented = vector<128x128xf32>>
>
> // GEMM
> %matrixD = nvgpu.wargroup.mma %descA, %descB, %matrixC ...
>
> // Epilogue
> nvgpu.wargroup.mma.store [%matrixD1, %matrixD2] to %sharedMemoryBuffer
> : !nvgpu.wargroup.accumulator<fragmented = vector<128x128xf32>>
> into memref<128x128xf32,3>
> ```
>
> I believe it's a good idea to proceed with this PR first, and then I will put the simplification PR later.
Works for me
https://github.com/llvm/llvm-project/pull/67530
More information about the Mlir-commits
mailing list