[Mlir-commits] [mlir] [MLIR] Use `test-lower-to-nvvm` for sm_90 Integration Tests on GitHub (PR #68184)

Wed Oct 4 00:41:02 PDT 2023

================
@@ -167,6 +166,20 @@ void buildGpuPassPipeline(OpPassManager &pm,
 
 void buildLowerToNVVMPassPipeline(OpPassManager &pm,
                                   const TestLowerToNVVMOptions &options) {
+  // Start with a cleanup pass.
+  pm.addPass(createCanonicalizerPass());
+  pm.addPass(createCSEPass());
+
+  //===----------------------------------------------------------------------===//
+  // NVGPU lowers device code as well as host code to the driver, so must run
+  // before outlining.
+  //===----------------------------------------------------------------------===//
+  // TODO: C++20 designated initializers.
+  ConvertNVGPUToNVVMPassOptions convertNVGPUToNVVMPassOptions;
----------------
grypp wrote:

Doing this helps me for 
1) Preserves the `nvgpu.tensormap.descriptor` type for `nvgpu.tma.async.load`, 
2) Resolves types before GPU outlining, allowing for passing just the device pointer.


To elaborate further, consider this example:
```
%d = nvgpu.tma.create.descriptor %0 box[%c128, %c64] : memref<*xf16> 
   -> !nvgpu.tensormap.descriptor<tensor = !shmemlhs, l2promo = swizzle_128b, l2promo=none, oob=zero, interleave=none>
...
gpu.launch() {
	nvgpu.tma.async.load %d[...]...
}
```
`tma.create.descriptor`: 
1) Invokes the CUDA driver for TMA descriptor generation, 
2) Returns the device pointer only.

`tma.async.load:`
1) Generates PTX for TMA load. 
2) Requires knowledge of l2promo and later swizzle.



https://github.com/llvm/llvm-project/pull/68184