[llvm] [Offload] Add olLaunchKernelSuggestedGroupSize (PR #142130)

Mon Jun 2 09:17:30 PDT 2025

callumfare wrote:

The `PreferredNumThreads` value looks like it comes from the KernelEnvironment which is an OpenMP-specific thing I don't think we can use it (if that's the case maybe it could be lifted out the plugin interface and into libomptarget itself). The liboffload path never has the KernelEnvironment set.

Either way the UR implementation of urKernelSuggestedWorkSize uses `cuOccupancyMaxPotentialBlockSize` and `hipModuleOccupancyMaxPotentialBlockSize` for CUDA and HIP respectively. I think at the liboffload level we should just expose those to the user, or something similar. I don't know if there's an HSA equivalent of the HIP function but it should be possible to implement something like it. That we we leave it up the language runtimes to reach their own conclusions about the best sizes with the information available to them.

https://github.com/llvm/llvm-project/pull/142130