[llvm] [Offload] Add olLaunchKernelSuggestedGroupSize (PR #142130)

Tue Jun 3 05:03:35 PDT 2025

RossBrunton wrote:

On reflection, I think I've been going about this in the most confusing way possible.

Just to summarise, we want these three things:
* A way to query the device for the maximum/preferred number of threads total/in one dimension. Perhaps as an `olDeviceGetInfo` query.
* A way to take this thread count and convert it into dimensions for a given kernel and launch dimensions. Similar to `urKernelGetSuggestedLocalWorkSize`.
* An enqueue function that automatically uses olKernelGetSuggestedLocalWorkSize (or similar) to work out the local work size, rather than having the user specify it manually.

There is a question of how much of this should be done in liboffload and how much in users of liboffload (e.g. UR). I think, to avoid duplication of code, it makes sense to provide this functionality in liboffload. What are your thoughts on this, @callumfare @jhuber6 ?

Regardless, I think doing the last step first is probably not the best. I'm going to mark this as draft, and work on the device info query for now.

https://github.com/llvm/llvm-project/pull/142130