[all-commits] [llvm/llvm-project] 17649a: [MLIR][NVGPU] Introduce `nvgpu.mbarrier.group` for...
Guray Ozen via All-commits
all-commits at lists.llvm.org
Fri Sep 22 08:09:57 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 17649a7726d3ce1ddba2bbf3ef73af03ea204753
https://github.com/llvm/llvm-project/commit/17649a7726d3ce1ddba2bbf3ef73af03ea204753
Author: Guray Ozen <guray.ozen at gmail.com>
Date: 2023-09-22 (Fri, 22 Sep 2023)
Changed paths:
M mlir/include/mlir/Conversion/NVGPUToNVVM/NVGPUToNVVM.h
M mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td
M mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
M mlir/lib/Dialect/NVGPU/TransformOps/NVGPUTransformOps.cpp
M mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
M mlir/test/Dialect/NVGPU/tmaload-transform.mlir
Log Message:
-----------
[MLIR][NVGPU] Introduce `nvgpu.mbarrier.group` for multiple mbarrier use (#65951)
A common practice involves the creation of multiple `mbarrier` objects,
see an example below. This is particularly valuable in scenarios like
software pipelining for GEMM, where we need to generate multiple
barriers dynamically use and wait them in a loop.
PR improves `nvgpu.mbarrier.barrier` type into the
`nvgpu.mbarrier.group`. All `mbarrier` related Ops now uses this type.
Consequently, these Ops are now capable of managing multiple barriers
seamlessly.
Having `num_barriers = 4` helps us to locate mbarrier object(s) into
static shared memory. We could make the value dynamic that requires
dynamic shared memory it would complicate the codegen.
```
%barriers = nvgpu.mbarrier.create -> !nvgpu.mbarrier.group<3, num_barriers = 4>
nvgpu.mbarrier.init %barriers[%c0], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4>
nvgpu.mbarrier.init %barriers[%c1], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4>
nvgpu.mbarrier.init %barriers[%c2], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4>
nvgpu.mbarrier.init %barriers[%c3], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4>
...
scf.for %i = %c0 to %n step %c1 {
nvgpu.mbarrier.try_wait %barriers[ (i % 4) ] ...
// ... Do work once mbarrier is ready
nvgpu.mbarrier.arrive.expect_tx %barriers[ (i + 3 % 4) ] ...
}
```
We will have mbarrier usages like below:
```
expect_tx[0]
expect_tx[1]
expect_tx[2]
Loop:
try_wait mbarrier[0], expect_tx[3]
try_wait mbarrier[1], expect_tx[0]
try_wait mbarrier[2], expect_tx[1]
try_wait mbarrier[3], expect_tx[2]
...
```
More information about the All-commits
mailing list