[Openmp-commits] [PATCH] D135162: [OPENMP] New api ompx_get_team_procs(devid)
Greg Rodgers via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Mon Oct 17 13:21:13 PDT 2022
gregrodgers added a comment.
"Team can technically map to lots of things and soon they will. What about "thread groups"? Or even "sockets"?
Also, there is some duplication I pointed out below."
But teams are exactly what we are trying to address with ompx_get_team_procs(). How many physical things can a team "map to"/"run on"? This is needed because a user (or runtime) may want to limit the number of teams created to utilize all (or some subset) of the number physical team processors. At one point I was thinking of utilizing the places API in OpenMP 5.2 spec chapter 18.3. If you imagined a place was like a device and a place partition was like a CU (or nvidisa SM) then omp_get_partition_num_places(). One of several problems with using the places API is that it is internal, I need an external API to help in setting number of teams on the specified device id.
The other big problem with places api is that it is written for thread management to work with thread affinity. I think overloading this with a team (group of threads) would get us in trouble fast.
I realize that the number of teams is typically application specific (or should be) and having this API may be a gun aimed at ones foot. But when cross team coordination is necessary, it can be beneficial to limit the number of teams so as to utilize all the hardware while minimizing the coordination among teams.
I addressed both of your inline comments. One with a fix and the other with an explanation..
================
Comment at: openmp/libomptarget/include/device.h:325
+ /// for AMD, this is number of CUs. Field used by ompx_get_team_procs(devid).
+ int32_t TeamProcs;
----------------
jdoerfert wrote:
> It's unclear why we need to store this in two places, the plugins and here. Other device data only lives in the plugins, this should too.
This is the value on the host DeviceTy that the getter and setter access. The getter is for the new external api ompx_get_team_procs(devid). The setter is called when the device is initialized and gets the value to set from the plugin which now stores the value in the DeviceData.
================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:356
std::vector<std::vector<CUmodule>> Modules;
+ std::vector<int> NumberOfTeamProcs;
----------------
jdoerfert wrote:
> This should go into DeviceData, and in the new plugin interface it's different again.
Done.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135162/new/
https://reviews.llvm.org/D135162
More information about the Openmp-commits
mailing list