[Openmp-commits] [PATCH] D82718: [OpenMP] Use primary context in CUDA plugin

Mon Jun 29 10:48:22 PDT 2020

Hahnfeld added a comment.

In D82718#2120414 <https://reviews.llvm.org/D82718#2120414>, @ye-luo wrote:

> In D82718#2120235 <https://reviews.llvm.org/D82718#2120235>, @Hahnfeld wrote:
>
> > (In general, all patches must be sent to the respective -commits list. This also makes feedback more likely.)
>
>
> Sorry, I was not aware of -commits list. What is it for exclusively? If there is policy or instruction , please point me.
>  I added OpenMP project tag. I you watch OpenMP project on differential. I expect you get notifications. Is it not the case?

It's in the very first part of https://www.llvm.org/docs/Phabricator.html, the primary usage documentation for Phabricator within LLVM. IIRC it's automatically added if you add the LLVM repository and touch a file under `openmp/`. The project tags are a more recent invention and I haven't seen a thread making it mandatory to get updates on submitted patches.

In D82718#2120407 <https://reviews.llvm.org/D82718#2120407>, @ye-luo wrote:

> In D82718#2120235 <https://reviews.llvm.org/D82718#2120235>, @Hahnfeld wrote:
>
> > Do you have a link for this? From a users / admin perspective, my only concern is that libomptarget should only "block" devices that are actually used. This is important for interactive machines that are configured in exclusive mode. It looks like `cuDevicePrimaryCtxRetain` does this, but maybe you can test that it's indeed working this way?
>
>
>
>
> 1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html
> 2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf
> 3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA.  https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX

Based on the documentation, using the primary context makes sense and I don't feel strongly that there's a need to keep the old behavior.

> libomptarget only engages a device if user requests it. I have no problems running 6 MPI on 6 GPUs within a single node. Each MPI owns one GPU exclusively.
>  I don't fully understand what you mean "blocking". I'm able to run my offload application + linux GUI on my desktop with only 1 GPU. So it is not blocking the GPU anyway, the time slicing of sharing a GPU still works.

Ok, I'll take your word that it's equivalent to the current behavior. With "blocking" I was referring to the driver configuration that ensures each device is only accessible by a single process (the second one to try will get an error). This broke often enough with MPI + PGI's OpenACC runtime in the past...

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82718/new/

https://reviews.llvm.org/D82718