[llvm] [Offload] Add olGetKernelMaxGroupSize (PR #142950)

Wed Jul 9 08:49:03 PDT 2025

RossBrunton wrote:

@jhuber6 For some context, I'm looking to implement urKernelGetSuggestedLocalWorkSize ( https://oneapi-src.github.io/unified-runtime/core/api.html#urkernelgetsuggestedlocalworksize ) which does a lot of magic but for the Cuda backend boils down to a call to `cuOccupancyMaxPotentialBlockSize`. This function basically takes in a kernel and shared memory size (i.e. dynamic memory used), and spits out the maximum number of work items that can fit on the device. Thus an equivalent offload API would also need to take in a kernel and memory size, meaning it can't use the normal (hypothetical) olGetKernelInfo interface.

Unless we want to rethink how we store kernels, I think a dedicated `olGetKernelMaxGroupSize` function is the best option.

https://github.com/llvm/llvm-project/pull/142950