[Openmp-commits] [PATCH] D135162: [OPENMP] New api ompx_get_team_procs(devid)

Johannes Doerfert via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Oct 17 13:33:25 PDT 2022


jdoerfert added a comment.

In D135162#3863163 <https://reviews.llvm.org/D135162#3863163>, @gregrodgers wrote:

> "Team can technically map to lots of things and soon they will. What about "thread groups"? Or even "sockets"?
> Also, there is some duplication I pointed out below."
>
> But teams are exactly what we are trying to address with ompx_get_team_procs().   How many physical things can a team "map to"/"run on"?   This is needed because a user (or runtime) may want to limit the number of teams created to utilize all (or some subset) of the number physical team processors.

I understand all that but as I said, CUs/SMs are not the only thing we can map teams on to. I could map 2 teams on each of them. Or one team one two of them (potentially with some compiler fixup logic). Etc.

Your use case is: Get me the number of processor groups so I can determine the number of teams I want. The hardware doesn't provide "team processors" but something else. And with a certain mapping in mind it makes sense to query the number of these things. However, conflating this with "teams" on the user level is problematic as "teams" can map in various ways, as mentioned.

> At one point I was thinking of utilizing the places API in OpenMP 5.2 spec chapter 18.3.   If you imagined a place was like a device and a place partition was like a CU (or nvidisa SM) then omp_get_partition_num_places().   One of several problems with using the places API is that it is internal, I need an external API to help in setting number of teams on the specified device id.

I can see external APIs to be useful for queries but setting the number should not be a thing. Getters reach the plugin managing the device and return the value of "hardware thing-ys".

> The other big problem with places api is that it is written for thread management to work with thread affinity.  I think overloading this with a team (group of threads) would get us in trouble fast.

Fair, I don't think this extension needs to unify such things.



================
Comment at: openmp/libomptarget/include/device.h:325
+  /// for AMD, this is number of CUs. Field used by ompx_get_team_procs(devid).
+  int32_t TeamProcs;
 
----------------
gregrodgers wrote:
> jdoerfert wrote:
> > It's unclear why we need to store this in two places, the plugins and here. Other device data only lives in the plugins, this should too.
> This is the value on the host DeviceTy that the getter and setter access.  The getter is for the new external api ompx_get_team_procs(devid).   The setter is called when the device is initialized and gets the value to set from the plugin which now stores the value in the DeviceData. 
But why do we need to store it in the plugin and here? No other information, e.g., max num threads, is stored twice. This just copies the value from the plugin once, which seem to provide little benefit.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135162/new/

https://reviews.llvm.org/D135162



More information about the Openmp-commits mailing list