[Openmp-commits] [openmp] [OpenMP] Reorganize the initialization of `PluginAdaptorTy` (PR #74397)
Jan André Reuter via Openmp-commits
openmp-commits at lists.llvm.org
Mon Dec 18 00:36:07 PST 2023
Thyre wrote:
> After this reorganization, all available devices are getting initialized eagerly through PluginAdaptorTy::initDevices. Before this patch, the device initialization would occur on demand through __tgt_target_kernel. While I don't know whether this leads to any measurable performance degradation, quite a bit of unnecessary work is being done now. For example, on an 8-GPU system, we are initializing all 8 devices now whereas previously we would initialize only 1 device (assuming that's all that was used by the program). Any reason why all devices are initialized eagerly?
>
> Additionally, this could have implications for overheads incurred when OMPT-based tools are involved. According to the spec, the runtime shall invoke the device_initialize callback defined by the tool when a device is initialized. So after this patch, this callback will be invoked for all devices which in turn could do quite a bit of unnecessary metadata setup, etc.
>
> I noticed another change because of this reorganization. A device is now getting initialized before main is called whereas previously it would be called from main through __tgt_target_kernel. This implies that the device initialization callback will now be called before main. While the spec does not say when this callback should be invoked, this change could cause problems for tools if static data structures were used by the tool. Perhaps this change is ok since the spec does not mandate any order. But again, is there a reason why this change was made?
>
> @jdoerfert
I guess having all devices initialized early might help with https://github.com/ROCm/ROCm/issues/2057. However, we moved away from using the returned value a while ago. So this doesn't matter for us.
As for potential additional overhead incurred with OMPT tools, I'll test how this MR affects Score-P. Looking at our code, we will initialize the device for measurements including locations (if tracing is available). Therefore, users will see empty locations for devices not being used. This isn't the case with other accelerator adapters.
Especially the initialization before main worries me a bit, since #69318 occurs when one tries to do certain initialization during OMPT init when offloading is used.
https://github.com/llvm/llvm-project/pull/74397
More information about the Openmp-commits
mailing list