[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

Mon Sep 18 09:44:00 PDT 2023

================
@@ -794,7 +794,7 @@ void CodeGenModule::Release() {
       AddGlobalCtor(ObjCInitFunction);
   if (Context.getLangOpts().CUDA && CUDARuntime) {
     if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
-      AddGlobalCtor(CudaCtorFunction);
+      AddGlobalCtor(CudaCtorFunction, /*Priority=*/0);
----------------
Artem-B wrote:

> User code in Clang interpreter, is also executed through global_ctors. This patch ensures kernels can be launched in the same iteration it is defined in by making the registration first in the list.

This sounds like an application-specific problem that may be addressable by lowering priority of user code initializers.

In general, I'm very reluctant to change the initialization order to be different from what NVCC generates. We do need to interoperate with NVIDIA's libraries and the change in initialization order is potentially risky. Considering that we have no practical way to test it, and that it appears to address something that affects only one application (and may be dealt with on the app level), I do not think we should change the priority for the clang-generated kernel registration code.



https://github.com/llvm/llvm-project/pull/66658