[llvm] [Offload] Add olGetKernelMaxGroupSize (PR #142950)
Ross Brunton via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 9 08:49:03 PDT 2025
RossBrunton wrote:
@jhuber6 For some context, I'm looking to implement urKernelGetSuggestedLocalWorkSize ( https://oneapi-src.github.io/unified-runtime/core/api.html#urkernelgetsuggestedlocalworksize ) which does a lot of magic but for the Cuda backend boils down to a call to `cuOccupancyMaxPotentialBlockSize`. This function basically takes in a kernel and shared memory size (i.e. dynamic memory used), and spits out the maximum number of work items that can fit on the device. Thus an equivalent offload API would also need to take in a kernel and memory size, meaning it can't use the normal (hypothetical) olGetKernelInfo interface.
Unless we want to rethink how we store kernels, I think a dedicated `olGetKernelMaxGroupSize` function is the best option.
https://github.com/llvm/llvm-project/pull/142950
More information about the llvm-commits
mailing list