[llvm] [Offload] Add olLaunchKernelSuggestedGroupSize (PR #142130)
    Callum Fare via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Mon Jun  2 09:17:30 PDT 2025
    
    
  
callumfare wrote:
The `PreferredNumThreads` value looks like it comes from the KernelEnvironment which is an OpenMP-specific thing I don't think we can use it (if that's the case maybe it could be lifted out the plugin interface and into libomptarget itself). The liboffload path never has the KernelEnvironment set.
Either way the UR implementation of urKernelSuggestedWorkSize uses `cuOccupancyMaxPotentialBlockSize` and `hipModuleOccupancyMaxPotentialBlockSize` for CUDA and HIP respectively. I think at the liboffload level we should just expose those to the user, or something similar. I don't know if there's an HSA equivalent of the HIP function but it should be possible to implement something like it. That we we leave it up the language runtimes to reach their own conclusions about the best sizes with the information available to them.
https://github.com/llvm/llvm-project/pull/142130
    
    
More information about the llvm-commits
mailing list