[Openmp-commits] [PATCH] D98832: [libomptarget] Tune the number of teams and threads for kernel launch.

Jon Chesterfield via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Fri Mar 19 06:31:54 PDT 2021

JonChesterfield added a comment.

In D98832#2635305 <https://reviews.llvm.org/D98832#2635305>, @dhruvachak wrote:

> ...
> Agreed. However, I don't see LDS usage in the metadata table in the image. Is it present there?

Yes, see https://llvm.org/docs/AMDGPUUsage.html for the list of what we can expect. What may not be obvious is that the metadata calls it ".group_segment_fixed_size". I don't know the origin of the terminology, maybe opencl?

> In theory, a very high sgpr count can limit the number of available workgroups if that's not factored in for determining the number of threads. But in practice, VGPRs tend to be the primary limiting factor. So perhaps we can start with using VGPRs for this purpose and have experience guide us in the future.

If I understand correctly, occupancy rules all look something like (resource used / resource available) == number simultaneous, where one of the resources tends to be limiting. Offhand, I think that's VGPR, SGPR, LDS (group segment). I think there's also an architecture dependent upper bound on how many things can run at once even if they use very little of those, maybe 8 for gfx9 and 16 for gfx10.

If that's right, perhaps the calculation should look something like:

  uint vgpr_occupancy = vgpr_used / vgpr_available;
  uint sgpr_occupancy = sgpr_used / sgpr_available;
  uint lds_occupancy = lds_used / lds_available;
  uint limiting_occupancy = min(vgpr_occupancy, sgpr_occupacny, lds_occupancy);

and then we derive threadsPerGroup from that occupancy and the various other considerations.

Comment at: openmp/libomptarget/plugins/amdgpu/src/rtl.cpp:823
       DeviceInfo.ThreadsPerGroup[device_id]) {
-    DeviceInfo.NumTeams[device_id] = DeviceInfo.ThreadsPerGroup[device_id];
+    DeviceInfo.NumThreads[device_id] = DeviceInfo.ThreadsPerGroup[device_id];
     DP("Default number of threads exceeds device limit, capping at %d\n",
This looks like a drive by copy/paste error fix, maybe post that separately?

If you're currently uploading diffs through the gui (based on the missing context comment) that's quite labour intensive. If you change to arcanist, the flow becomes

git checkout main
git checkout -b some_feature
git add -u && git commit -m "message"
arc diff main # opens an editor



More information about the Openmp-commits mailing list