[llvm] [Offload] Allow CUDA Kernels to use arbitrarily large shared memory (PR #145963)

Wed Jul 2 11:48:52 PDT 2025

================
@@ -1302,6 +1305,16 @@ Error CUDAKernelTy::launchImpl(GenericDeviceTy &GenericDevice,
   if (GenericDevice.getRPCServer())
     GenericDevice.Plugin.getRPCServer().Thread->notify();
 
+  // In case we require more memory than the current limit.
+  if (MaxDynCGroupMem >= MaxDynCGroupMemLimit) {
----------------
jdoerfert wrote:

> Hmm, is this code dedicated for OpenMP?

What other code path would trigger this right now?

> This is part of why I think we should remove more OpenMP logic from the core library. It would be better to make this guarantee at the libomptarget layer, since we likely want to maintain opt-in behavior. Worst case we use yet another environment variable.

We can't do this in libomptarget. The kernel handle is not there or we start to unravel the abstraction layers. As for the other users: From all I can tell, this is equally useful for SYCL and users that would use the new API "manually", since they can't call anything else either. Only for the user from CUDA/HIP, which is not actually supported right now, this would not be necessary. Let's keep the restructuring somewhat down to earth. If this, and other things, become a problem for other users, we will add "option flags" that guard the logic. Without having evidence for this, nor having evidence that this isn't more generally useful *here*, let's not try to force separation for separation's sake.


https://github.com/llvm/llvm-project/pull/145963