[Mlir-commits] [mlir] [MLIR] Use `test-lower-to-nvvm` for sm_90 Integration Tests on GitHub (PR #68184)
Guray Ozen
llvmlistbot at llvm.org
Wed Oct 4 00:41:02 PDT 2023
================
@@ -167,6 +166,20 @@ void buildGpuPassPipeline(OpPassManager &pm,
void buildLowerToNVVMPassPipeline(OpPassManager &pm,
const TestLowerToNVVMOptions &options) {
+ // Start with a cleanup pass.
+ pm.addPass(createCanonicalizerPass());
+ pm.addPass(createCSEPass());
+
+ //===----------------------------------------------------------------------===//
+ // NVGPU lowers device code as well as host code to the driver, so must run
+ // before outlining.
+ //===----------------------------------------------------------------------===//
+ // TODO: C++20 designated initializers.
+ ConvertNVGPUToNVVMPassOptions convertNVGPUToNVVMPassOptions;
----------------
grypp wrote:
Doing this helps me for
1) Preserves the `nvgpu.tensormap.descriptor` type for `nvgpu.tma.async.load`,
2) Resolves types before GPU outlining, allowing for passing just the device pointer.
To elaborate further, consider this example:
```
%d = nvgpu.tma.create.descriptor %0 box[%c128, %c64] : memref<*xf16>
-> !nvgpu.tensormap.descriptor<tensor = !shmemlhs, l2promo = swizzle_128b, l2promo=none, oob=zero, interleave=none>
...
gpu.launch() {
nvgpu.tma.async.load %d[...]...
}
```
`tma.create.descriptor`:
1) Invokes the CUDA driver for TMA descriptor generation,
2) Returns the device pointer only.
`tma.async.load:`
1) Generates PTX for TMA load.
2) Requires knowledge of l2promo and later swizzle.
https://github.com/llvm/llvm-project/pull/68184
More information about the Mlir-commits
mailing list