[Mlir-commits] [mlir] [mlir][gpu] Deprecate gpu::Serialization* passes. (PR #65857)
Fabian Mora
llvmlistbot at llvm.org
Tue Sep 12 18:42:29 PDT 2023
================
@@ -78,11 +78,13 @@ void mlir::sparse_tensor::buildSparseCompiler(
// Finalize GPU code generation.
if (gpuCodegen) {
-#if MLIR_GPU_TO_CUBIN_PASS_ENABLE
- pm.addNestedPass<gpu::GPUModuleOp>(createGpuSerializeToCubinPass(
- options.gpuTriple, options.gpuChip, options.gpuFeatures));
-#endif
+ GpuNVVMAttachTargetOptions nvvmTargetOptions;
+ nvvmTargetOptions.triple = options.gpuTriple;
+ nvvmTargetOptions.chip = options.gpuChip;
+ nvvmTargetOptions.features = options.gpuFeatures;
+ pm.addPass(createGpuNVVMAttachTarget(nvvmTargetOptions));
pm.addPass(createGpuToLLVMConversionPass());
+ pm.addPass(createGpuModuleToBinaryPass());
----------------
fabianmcg wrote:
Oh, I see, I think I know what's happening: the tests use something like `chip=sm_80` and are running on a device with a lower compute capability (`sm_70`).
Background: The compute capability was never utilized at any point in `--gpu-to-cubin`, and it always compiled the code for the arch found at compile time, that's why the tests never gave issues before.
With this new method, there are 2 mechanisms:
- `format=bin` compiles only for the specified chip and if there's an arch miss-match then a `CUDA_ERROR_NO_BINARY_FOR_GPU` is thrown.
- `format=fatbin` generates a binary for the specified chip and also embeds the PTX, allowing the driver to JIT the code. However, something I found recently is that the driver can only JIT to a higher CC, so for example `chpi=sm_50` can run on `sm_80`, but `chip=sm_80` cannot run on `sm_50` and in this case one also gets `CUDA_ERROR_NO_BINARY_FOR_GPU`.
NOTE: `fatbin` is the default format.
The issue is this line [sparse-matmul-lib.mlir#L5](https://github.com/llvm/llvm-project/blob/main/mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-matmul-lib.mlir#L5).
How to solve? Use the lowest arch required for the test, that's why most integration tests in trunk use `sm_50`.
Please let me know, if the above works. I think it will, because when I ran the tests on an A100 those tests passed.
Wrt to blaze, I'm working on the JIT only path, so stay tuned.
https://github.com/llvm/llvm-project/pull/65857
More information about the Mlir-commits
mailing list