[Openmp-commits] [PATCH] D104418: [OpenMP][Offloading] Fixed data race in libomptarget caused by async data movement

Ye Luo via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Fri Aug 20 12:27:33 PDT 2021

ye-luo added a comment.

In D104418#2957609 <https://reviews.llvm.org/D104418#2957609>, @grokos wrote:

> In D104418#2957432 <https://reviews.llvm.org/D104418#2957432>, @ye-luo wrote:
>> There is a performance hit from frequent creating/destroying events at very transfer. The create/destroy logic can be separate from recording events.
> What an event is and how it is handled is plugin-specific. `createEvent` and `destroyEvent` can be implemented e.g. via a pool of events where an event will be picked upon `createEvent` and returned back to the pool upon `destroyEvent` (similar to what we do for streams in the CUDA plugin). So these performance considerations can be fixed in a subsequent patch. Right now there is a race condition - a correctness concern - that we should fix. At the very least, since the logic in the base library is correct and we pretty much agree on it, we should let this patch land without the CUDA plugin changes; then we can provide optimized implementations for the `__tgt_rtl_*_event` functions for the plugins we care about in future patches.

Now Event is stored in HostDataToTargetTy. It is not plugin-specific logic.

Consider the issues in the current implementation I just found. I have a strong feeling that the current fix should not get in.
I think we should just create one event per HostDataToTargetTy at the constructor and use the same event for a give mapped region.

In order to avoid blocking the good pieces in this patch, it is better to take out all the plugin API and CUDA plugin change to a separate patch.
I demand split createEvent into createEvent and recordEvent and also make each API doing well defined to minimal unit of work.

Using a pool of events is an optimization we may try at a later stage. Right now, I'm more concerned about the issue I found above instead of performance impact.

Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:1362
+                void *EventPtr) const {
+    CUstream Stream = getStream(DeviceId, AsyncInfo);
+    CUevent Event = reinterpret_cast<CUevent>(EventPtr);
Why waiting for an event needs to pull a stream in.

Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:1379
+  int destroyEvent(const int DeviceId, __tgt_async_info *AsyncInfo,
+                   void *EventPtr) const {
grokos wrote:
> `DeviceId` is not used in this function, so it doesn't need to be an argument.
AsyncInfo  is not needed. If the stream has been returned to the poo, you just print nullptr all the time.

Comment at: openmp/libomptarget/src/device.cpp:276
-    int Ret = submitData(TargetPointer, HstPtrBegin, Size, AsyncInfo);
+      int Ret = submitData(TargetPointer, HstPtrBegin, Size, AsyncInfo);
I think the purpose of introducing this data transfer event is to achieve the atomic transfer behavior.
So it is unsafe to issue transfer right away without checking the status of Event.
If a map(always comes from a different thread.

Comment at: openmp/libomptarget/src/device.cpp:295
+      Entry->unlock();
+      // If there is an event attached, destroy it.
+      if (OldEvent)
The current event never got destroyed unitl the next transfer.
1. event doesn't got properly destroyed.
2. almost like one event per map.

  rG LLVM Github Monorepo



More information about the Openmp-commits mailing list