[llvm] [Offload] Add olGetKernelMaxGroupSize (PR #142950)

Fri Jul 25 08:04:14 PDT 2025

RossBrunton wrote:

@jhuber6 I think a more appropriate comparison would be the [`cuOccupancyMaxPotentialBlockSize`](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__OCCUPANCY.html#group__CUDA__OCCUPANCY_1gf179c4ab78962a8468e41c3f57851f03) CUDA function or [clGetKernelSuggestedLocalWorkSizeKHR](https://registry.khronos.org/OpenCL/sdk/3.0/docs/man/html/clGetKernelSuggestedLocalWorkSizeKHR.html). As I understand it, `CL_DEVICE_MAX_WORK_GROUP_SIZE` is the maximum work group size for any kernel, and specific kernels that require a lot of resources may have a smaller limit.

I'm looking to implement [urKernelGetSuggestedLocalWorkSize](https://oneapi-src.github.io/unified-runtime/core/api.html#urkernelgetsuggestedlocalworksize) on top of liboffload. The cuda UR backend uses `cuOccupancyMaxPotentialBlockSize`, so I wanted to implement a way to expose that function through liboffload.

https://github.com/llvm/llvm-project/pull/142950