[Mlir-commits] [mlir] [MLIR][ExecutionEngine] Tolerate CUDA_ERROR_DEINITIALIZED in mgpuModuleUnload (PR #190563)

Sun Apr 5 17:40:20 PDT 2026

https://github.com/jaredhoberock created https://github.com/llvm/llvm-project/pull/190563

`mgpuModuleUnload` may be called from a global destructor (registered by `SelectObjectAttr`'s `appendToGlobalDtors`) after the CUDA primary context has already been destroyed during program shutdown. In this case, `cuModuleUnload` returns `CUDA_ERROR_DEINITIALIZED`, which is benign since the module's resources are already freed with the context.

## Reproduction

Any program that uses `gpu.launch_func` and is AOT-compiled (via `mlir-translate --mlir-to-llvmir | llc | cc -lmlir_cuda_runtime`) will print `'cuModuleUnload(module)' failed with '<unknown>'` on exit. This is because `SelectObjectAttr` registers the module unload as a global destructor, which runs after the CUDA primary context is released.

This script reproduces the error message from `mgpuModuleUnload` on my system:

```
#!/bin/bash
set -e

LLVM_BUILD=${LLVM_BUILD:-$HOME/dev/git/llvm-project-22/build}

cat > /tmp/repro.mlir << 'MLIR'
func.func @main() {
  %c1 = arith.constant 1 : index
  gpu.launch blocks(%bx, %by, %bz) in (%gx = %c1, %gy = %c1, %gz = %c1)
             threads(%tx, %ty, %tz) in (%bsx = %c1, %bsy = %c1, %bsz = %c1) {
    gpu.terminator
  }
  return
}
MLIR

$LLVM_BUILD/bin/mlir-opt /tmp/repro.mlir \
  -gpu-lower-to-nvvm-pipeline="cubin-format=fatbin" \
  | $LLVM_BUILD/bin/mlir-translate --mlir-to-llvmir -o /tmp/repro.ll

$LLVM_BUILD/bin/llc -relocation-model=pic -filetype=obj /tmp/repro.ll -o /tmp/repro.o

cc /tmp/repro.o \
  -L$LLVM_BUILD/lib -Wl,-rpath,$LLVM_BUILD/lib \
  -lmlir_cuda_runtime -lmlir_runner_utils -o /tmp/repro

echo "Running:"
/tmp/repro 2>&1
echo "Exit code: $?"
```
                                                                                                                                                                                                                                                                                      
## Context

This matches how other projects handle the same shutdown ordering issue:
- Clang CUDA (D48613) switched module cleanup from `__attribute__((destructor))` to `atexit()`
- GCC libgomp checks context validity before `cuModuleUnload`
- Apache TVM silently ignores `CUDA_ERROR_DEINITIALIZED` on module unload


>From 703a90c6f1ab5c74f6d344bc26fefe6296ceb28d Mon Sep 17 00:00:00 2001
From: Jared Hoberock <jaredhoberock at gmail.com>
Date: Sun, 5 Apr 2026 19:25:47 -0500
Subject: [PATCH] [MLIR][ExecutionEngine] Tolerate CUDA_ERROR_DEINITIALIZED in
 mgpuModuleUnload

mgpuModuleUnload may be called from a global destructor (registered by
SelectObjectAttr's appendToGlobalDtors) after the CUDA primary context
has already been destroyed during program shutdown. In this case,
cuModuleUnload returns CUDA_ERROR_DEINITIALIZED, which is benign since
the module's resources are already freed with the context.

This matches the approach used by Clang's CUDA support, TVM, and GCC's
libgomp NVPTX plugin for the same shutdown ordering issue.
---
 mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp b/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
index 7bf6804902479..24c88b9fa587b 100644
--- a/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
+++ b/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
@@ -127,7 +127,12 @@ extern "C" MLIR_CUDA_WRAPPERS_EXPORT CUmodule mgpuModuleLoad(void *data) {
 }
 
 extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuModuleUnload(CUmodule module) {
-  CUDA_REPORT_IF_ERROR(cuModuleUnload(module));
+  // At program exit, the CUDA primary context may already be destroyed.
+  // CUDA_ERROR_DEINITIALIZED is benign — the module's resources are already
+  // freed with the context.
+  CUresult result = cuModuleUnload(module);
+  if (result != CUDA_SUCCESS && result != CUDA_ERROR_DEINITIALIZED)
+    CUDA_REPORT_IF_ERROR(result);
 }
 
 extern "C" MLIR_CUDA_WRAPPERS_EXPORT CUfunction