[all-commits] [llvm/llvm-project] d20fbc: [MLIR][NVGPU] Introduce `nvgpu.wargroup.mma.store`...
Guray Ozen via All-commits
all-commits at lists.llvm.org
Thu Oct 5 01:54:26 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: d20fbc900783db3a87c8da622ede280d93f890bb
https://github.com/llvm/llvm-project/commit/d20fbc900783db3a87c8da622ede280d93f890bb
Author: Guray Ozen <guray.ozen at gmail.com>
Date: 2023-10-05 (Thu, 05 Oct 2023)
Changed paths:
M mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td
M mlir/lib/Conversion/NVGPUToNVVM/CMakeLists.txt
M mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
M mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
M mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
M utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
Log Message:
-----------
[MLIR][NVGPU] Introduce `nvgpu.wargroup.mma.store` Op for Hopper GPUs (#65441)
This PR introduces a new Op called `warpgroup.mma.store` to the NVGPU
dialect of MLIR. The purpose of this operation is to facilitate storing
fragmanted result(s) `nvgpu.warpgroup.accumulator` produced by
`warpgroup.mma` to the given memref.
An example of fragmentated matrix is given here :
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#wgmma-64n16-d
The `warpgroup.mma.store` does followings:
1) Takes one or more `nvgpu.warpgroup.accumulator` type (fragmented
results matrix)
2) Calculates indexes per thread in warp-group and stores the data into
give memref.
Here's an example usage:
```
// A warpgroup performs GEMM, results in fragmented matrix
%result1, %result2 = nvgpu.warpgroup.mma ...
// Stores the fragmented result to memref
nvgpu.warpgroup.mma.store [%result1, %result2], %matrixD :
!nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>>,
!nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>>
to memref<128x128xf32,3>
```
More information about the All-commits
mailing list