[llvm] [Offload] Allow CUDA Kernels to use arbitrarily large shared memory (PR #145963)
Joseph Huber via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 26 13:38:24 PDT 2025
================
@@ -160,6 +160,9 @@ struct CUDAKernelTy : public GenericKernelTy {
private:
/// The CUDA kernel function to execute.
CUfunction Func;
+ /// The maximum amount of dynamic shared memory per thread group. By default,
+ /// this is set to 48 KB.
+ mutable uint32_t MaxDynCGroupMemLimit = 49152;
----------------
jhuber6 wrote:
This shouldn't be mutable, we probably want to initialize this at kernel creation using the correct value from the `cuFuncGetAttributes` function. Alternatively we could just check that value every time we launch a kernel, though I don't know how much overhead that would add.
Making it mutable keeps it up-to-date I suppose, so it would avoid redundant work if we call the function multiple times with a different opt-in. However I'd say that's required for correctness because theoretically a user could use an API to modify it manually so it's probably best to just play it safe.
https://github.com/llvm/llvm-project/pull/145963
More information about the llvm-commits
mailing list