[Openmp-commits] [PATCH] D107656: [OpenMP] Use events and taskyield in target nowait task to unblock host threads

Ye Luo via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Sat Aug 7 17:42:50 PDT 2021


ye-luo added inline comments.


================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:1143
 
+  int recordEvent(const int DeviceId, __tgt_async_info *AsyncInfo) const {
+    CUstream Stream = reinterpret_cast<CUstream>(AsyncInfo->Queue);
----------------
tianshilei1992 wrote:
> This function goes too far. It contains:
> 1. Create an event;
> 2. Return the stream;
> 3. Nullify the queue pointer.
> Considering `recordEvent` will be used in many other places, such as D104418, please separate it.
it is not clear to me how you would prefer the event manipulation API in the plugins to look like. Could you put up a separate patch by extracting those out of D104418 ? It seems that you need to expose an event in the API. Once you have that up, I can refactor/reorganize my side.


================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:1189
+    }
+    Err = cuEventDestroy(Event);
+    if (Err != CUDA_SUCCESS) {
----------------
tianshilei1992 wrote:
> ye-luo wrote:
> > tianshilei1992 wrote:
> > > Event destroy worths a separate function. We could add a new return value such as `OFFLOAD_NOT_DONE` to indicate the event is not fulfilled. It is not a good idea to mix event query and event destroy.
> > I'm trying to avoid adding stuff that is not immediately used.
> > Similar to  Queue, the manipulation of Event is within the plugin and there are no need of APIs to create/destroy events from outside.
> > 
> > recordEvent is responsible to create and record an event.
> > queryEvent is responsbile to query and destroy an event upon completion.
> > 
> > ```
> > #define OFFLOAD_SUCCESS (0)
> > #define OFFLOAD_FAIL (~0)
> > ```
> > This is what I found, not even enum. I don't see a clean way to extend OFFLOAD_NOT_DONE
> > I think fixing the return style is not in the scope of this patch. Some design is needed for return values between plugin and device class and between device to omptarget.
> > 
> > In this case,
> > OFFLOAD_FAIL is for errors reported by CUDA runtime. Event point to signal it is completed or still on going.
> > This is what I found, not even enum. I don't see a clean way to extend OFFLOAD_NOT_DONE
> It is because these values are used in both `libomptarget` (C++ API) and plugins (C API).
> 
> > I think fixing the return style is not in the scope of this patch. Some design is needed for return values between plugin and device class and between device to omptarget.
> I don't doubt that but it's not good to "twist" the code to fit existing code if there is apparently a better way to do it. If one part needs to be extended to support new features, just do it in another patch and make this one depend on it.
OFFLOAD_NOT_DONE needs to come from the plugin. An enquiry needs to return 3 states. fail, done, not done. I'm wondering how to do it properly? Is there an example to follow?.


================
Comment at: openmp/libomptarget/src/device.cpp:552
+  if (RTL->record_event) {
+    AsyncInfo.setEventSupported(true);
+    return RTL->record_event(RTLDeviceID, AsyncInfo);
----------------
tianshilei1992 wrote:
> Whether the event is supported is per-device, so no need to put one indicator it in every async info.
Indeed I wanted to change that. Is DeviceTy and its constructor the right place to keep and initialize this flag?


================
Comment at: openmp/libomptarget/src/omptarget.cpp:51
+          Ret = Device.queryEvent(*this);
+        } while (Ret == OFFLOAD_SUCCESS && AsyncInfo.Event);
+
----------------
tianshilei1992 wrote:
> ye-luo wrote:
> > tianshilei1992 wrote:
> > > I'm thinking we can actually do more here. For example, set a count for every task yield. When the count reaches a threshold, fall back to stream synchronize. The threshold can be configured via env and so on.
> > This looks like an optimization which should be explored separately. I think I may use cuEventSynchronize.
> If `cuEventSynchronize` is better than stream one (e.g. the synchronization is no longer just spinning but something similar to signal), it's worth to separate the patch with something like:
> ```
> // launch kernel
> // create event
> // synchronize
> ```
> And in CUDA plugin, the synchronize is event synchronize. Then apply this patch on that.
Let us consolidate the API first. Any optimization further optimization should be deferred.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107656/new/

https://reviews.llvm.org/D107656



More information about the Openmp-commits mailing list