[Openmp-commits] [PATCH] D82718: [OpenMP] Use primary context in CUDA plugin
Ye Luo via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Mon Jun 29 09:42:44 PDT 2020
ye-luo added a comment.
In D82718#2120235 <https://reviews.llvm.org/D82718#2120235>, @Hahnfeld wrote:
> (In general, all patches must be sent to the respective -commits list. This also makes feedback more likely.)
> > Retaining per device primary context is preferred to creating a context owned by the plugin.
> > CUDA driver API documentation recommends this.
> Do you have a link for this? From a users / admin perspective, my only concern is that libomptarget should only "block" devices that are actually used. This is important for interactive machines that are configured in exclusive mode. It looks like `cuDevicePrimaryCtxRetain` does this, but maybe you can test that it's indeed working this way?
1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html
2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf
3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX
libomptarget only engages a device if user requests it. I have no problems running 6 MPI on 6 GPUs within a single node. Each MPI owns one GPU exclusively.
I don't fully understand what you mean "blocking". I'm able to run my offload application + linux GUI on my desktop with only 1 GPU. So it is not blocking the GPU anyway, the time slicing of sharing a GPU still works.
CHANGES SINCE LAST ACTION
More information about the Openmp-commits