[Openmp-commits] [PATCH] D82718: [OpenMP] Use primary context in CUDA plugin

Wed Jul 1 12:27:03 PDT 2020

grokos added a comment.

In D82718#2124420 <https://reviews.llvm.org/D82718#2124420>, @tianshilei1992 wrote:

> I did some investigation and finally think `BLOCKING_SYNC` might be a good option here, but I also would like to hear from others.
>  Basically we have three options here: `SPIN`, `YIELD` and the `BLOCKING_SYNC`.
>
> - `SPIN`: In most cases this is not a good option but it might be better for a really tiny kernel.
> - `YIELD`: Like mentioned in CUDA documentation, it can increase latency, but can increase the performance of CPU threads performing work in parallel with the GPU. But for OpenMP, this level of yield may not help too much because it is not very common to have thread oversubscription in OpenMP.
> - `BLOCKING_SYNC`: I guess chances are that it works with interruption. This is kind of a balanced option which will not increase too much latency, and also will not waste the resource, especially considering that we might be going to use unshackled threads for all target tasks.
> - `AUTO`: In fact this option will be used if there is no flag specified. According to CUDA documentation, the behavior depends on the number of CUDA contexts in current process versus the number of logical processors in the system. Either `SPIN` or `YIELD` will be used. `BLOCKING_SYNC` will only be chosen on Tegra devices.

>From you analysis `BLOCKING_SYNC` looks best from an OpenMP perspective. We can make that the default and maybe introduce an env var `LIBOMPTARGET_CUDA_CTX_SCHED_POLICY` so the user can override it (this is better to be done in another patch).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82718/new/

https://reviews.llvm.org/D82718