[Openmp-commits] [PATCH] D50522: [OpenMP][libomptarget] Bringing up to spec with respect to OMP_TARGET_OFFLOAD env var

Fri Aug 10 00:12:52 PDT 2018

Hahnfeld added a comment.

In https://reviews.llvm.org/D50522#1194805, @AlexEichenberger wrote:

> In https://reviews.llvm.org/D50522#1194210, @Hahnfeld wrote:
>
> > You are right, "available" is not defined in the standard. I've always though of "plugged into the system", ie all devices that are visible to the CUDA runtime. That would match the current implementation of `omp_get_num_devices` which is defined to return "the number of available target devices".
> >  Actually this behaviour would be important for us as we have our GPUs configured exclusively. So when there is already a process running all other users get a runtime error. In that case it would be very helpful to have `libomptarget` abort the program.
>
>
> So you like deciding available on first use? This is what your comment seems to imply, but I am not 100% sure.

No, I favor the implication "visible" -> "available" which is the same interpretation that `omp_get_num_devices` is using (in its current form).
If we implemented the behaviour of this patch ("successful offload to ONE device" -> "ALL devices available", "error on ONE device" -> "NO devices available") we'd need to change the API methods. That would probably imply probing all devices at runtime startup - because after all we don't know which device the user is going to use. IIRC that was to be avoided, libomptarget uses lazy initialization at the moment.

In https://reviews.llvm.org/D50522#1194808, @caomhin wrote:

> I think that our current interpretation of “available” for devices is
>  reasonable. There may be many reasons that a device is not available, even
>  if it is plugged in. Deciding that it is available because we were able to
>  use it seems the most dynamic method of determining this.

Suppose we have 2 devices plugged into the system, and the first one cannot be used (for whatever reason: hardware failure, exclusive configuration and somebody else is running, etc.).
Now a clever application sees the two (because `omp_get_num_devices()` returns 2) and does:

  #pragma omp parallel num_threads(omp_get_num_devices())
  {
    #pragma omp target device(omp_get_thread_num())
    { }
  }

I think the runtime behaviour with this patch depends on the execution order (and exposes a race condition in `handle_target_outcome` on `TargetOffloadPolicy`; let's ignore that for now):

- If `target device(0)` executes first, libomptarget will notice the error and silently disable offloading. All `target` regions will execute on the host.
- If however `target device(1)` executes first and returns successfully, libomptarget will raise `OMP_TARGET_OFFLOAD` to `MANDATORY` and will abort execution when catching the error of `target device(0)`.

I don't think that makes much sense. IMO the runtime should detect two "visible" -> "available" devices and abort execution in all cases.

Repository:
  rOMP OpenMP

https://reviews.llvm.org/D50522