[Openmp-commits] [PATCH] D104418: [PoC][WIP][OpenMP][Offloading] Fixed data race in libomptarget caused by async data movement

Joachim Protze via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Thu Jun 17 01:05:14 PDT 2021


protze.joachim added a comment.

I must admit that I have no idea about the internals of libomptarget. But here is my high-level view on this issue:

>From my perspective, the compiler should help to distinguish two situations (a weakness of OpenMP is that the DAG only evolves during execution):
(1) target tasks with dependencies, which only synchronize with other target tasks on the same device (outgoing edges are limited to target tasks on the same device)
(2) target tasks with dependencies, which synchronize with host code or target tasks on other devices.
If the compiler cannot prove (1), fall back to (2).

If the compiler can prove, that a target task is of class (1), a fast path is possible by submitting all related device code onto the same device queue ("stream"). Similar, if the runtime can prove that outgoing edges are limited to the same device and no new edges might be added the runtime might identify additional target tasks of class (1). The latter might be difficult, because the information is in libomp not in libomptarget.

  #pragma omp target enter data map(to:a[:N]) depend(inout:a[0:N]) nowait
  #pragma omp target depend(inout:a[0:N]) nowait
  {...}
  #pragma omp target exit data map(from:a[:N]) depend(inout:a[0:N]) nowait
  #pragma omp taskwait

In this case, no host tasks are dependent on target task dependencies -> just submit the operations onto the same device queue.

Only if the host (possibly) has dependencies on the target task, fine-grained synchronization with the host is necessary:

  #pragma omp target enter data map(to:a[:N]) depend(inout:a[0:N]) nowait
  #pragma omp target depend(inout:a[0:N]) nowait
  {...}
  #pragma omp target exit data map(from:a[:N]) depend(inout:a[0:N]) nowait
  #pragma omp task depend(inout:a[0:N])
  {}

The first two target tasks, can still rely on device queue synchronization, the last target task must ensure synchronization with the host. I think, this could be mapped to the OpenMP completion event model:

  #pragma omp target exit data map(from:a[:N]) depend(inout:a[0:N]) detach(event) nowait
  #pragma omp task depend(inout:a[0:N])
  {}

Libomptarget "calls omp_fulfill_event(event)" when the exit data is completed, e.g., completion of asynchronous memcopy is signaled with a callback. This should result in a clean interface for synchronization between libomp and libomptarget.

I can see three candidates, where things might break at the moment:

- actions for target tasks with the same dependency are not submitted to the same device queue
- device queue don't enforce the necessary ordering for asynchronous execution (copy-to-device, kernel launch, copy-from-device)
- class (2) target tasks miss to ensure completion before they complete and release outgoing dependencies.

Especially the last point seems like a hot candidate, if the target task just launches the asynchronous execution, but does not ensure completion before the task is marked completed.
In that case, the implicit barrier at the end of `BlockMatMul_TargetNowait` could pass, although not all device activity is completed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104418/new/

https://reviews.llvm.org/D104418



More information about the Openmp-commits mailing list