[Mlir-commits] [mlir] [mlir][gpu] Deprecate gpu::Serialization* passes. (PR #65857)

Tue Sep 12 17:49:51 PDT 2023

================
@@ -78,11 +78,13 @@ void mlir::sparse_tensor::buildSparseCompiler(
 
   // Finalize GPU code generation.
   if (gpuCodegen) {
-#if MLIR_GPU_TO_CUBIN_PASS_ENABLE
-    pm.addNestedPass<gpu::GPUModuleOp>(createGpuSerializeToCubinPass(
-        options.gpuTriple, options.gpuChip, options.gpuFeatures));
-#endif
+    GpuNVVMAttachTargetOptions nvvmTargetOptions;
+    nvvmTargetOptions.triple = options.gpuTriple;
+    nvvmTargetOptions.chip = options.gpuChip;
+    nvvmTargetOptions.features = options.gpuFeatures;
+    pm.addPass(createGpuNVVMAttachTarget(nvvmTargetOptions));
     pm.addPass(createGpuToLLVMConversionPass());
+    pm.addPass(createGpuModuleToBinaryPass());
----------------
aartbik wrote:

Very nice! Thank you!

With a proper NVPTX built MLIR, I can debug this much better with these flags. Note that the sparse compiler has a CUDA codegen path (CUDA threads are generated for outermost loops of sparse code) and a CUDA libgen path (cuSPARSE, cuSPARSElt). For the former, my desktop GPU (Quadro P1000) used to be able to run the "codegen" path with the cubin pass under the given flags, even though sm_80 was a bit too high). By adjusting the test for my desktop, I see clean run again:

Tool invocation for module: "sparse_kernels"
ptxas -arch sm_50 /tmp/mlir-sparse_kernels-nvptx64-nvidia-cuda-sm_50-05ebde.ptx -o /tmp/mlir-sparse_kernels-nvptx64-nvidia-cuda-sm_50-05ebde.ptx.cubin --opt-level 2
fatbinary -64 --image3=kind=elf,sm=50,file=/tmp/mlir-sparse_kernels-nvptx64-nvidia-cuda-sm_50-05ebde.ptx.cubin --image3=kind=ptx,sm=50,file=/tmp/mlir-sparse_kernels-nvptx64-nvidia-cuda-sm_50-05ebde.ptx --create /tmp/mlir-sparse_kernels-nvptx64-nvidia-cuda-sm_50-290575.bin

-- JIT main !

( 87360, 89440, 91520, 93600, 95680, 97760, 99840, 101920, 104000, 106080, 108160, 110240, 112320, 114400, 116480, 118560, 120640, 122720, 124800, 126880, 128960, 131040, 133120, 135200, 137280, 139360, 141440, 143520, 145600, 147680, 149760, 151840, 153920, 156000, 158080, 160160, 162240, 164320, 166400, 168480, 170560, 172640, 174720, 176800, 178880, 180960, 183040, 185120, 187200, 189280, 191360, 193440, 195520, 197600, 199680, 201760, 203840, 205920, 208000, 210080, 212160, 214240, 216320, 218400 )

Note that we still have an issue in our blaze based set up getting the  nvptxcompiler in place, but that is another story ;-)


https://github.com/llvm/llvm-project/pull/65857