[llvm] [Offload] Allow CUDA Kernels to use arbitrarily large shared memory (PR #145963)

Thu Jun 26 13:38:24 PDT 2025

================
@@ -160,6 +160,9 @@ struct CUDAKernelTy : public GenericKernelTy {
 private:
   /// The CUDA kernel function to execute.
   CUfunction Func;
+  /// The maximum amount of dynamic shared memory per thread group. By default,
+  /// this is set to 48 KB.
+  mutable uint32_t MaxDynCGroupMemLimit = 49152;
----------------
jhuber6 wrote:

This shouldn't be mutable, we probably want to initialize this at kernel creation using the correct value from the `cuFuncGetAttributes` function. Alternatively we could just check that value every time we launch a kernel, though I don't know how much overhead that would add.

Making it mutable keeps it up-to-date I suppose, so it would avoid redundant work if we call the function multiple times with a different opt-in. However I'd say that's required for correctness because theoretically a user could use an API to modify it manually so it's probably best to just play it safe.

https://github.com/llvm/llvm-project/pull/145963