[Openmp-commits] [PATCH] D107656: [OpenMP] Use events and taskyield in target nowait task to unblock host threads

Shilei Tian via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Sat Aug 7 17:05:32 PDT 2021


tianshilei1992 added inline comments.


================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:1143
 
+  int recordEvent(const int DeviceId, __tgt_async_info *AsyncInfo) const {
+    CUstream Stream = reinterpret_cast<CUstream>(AsyncInfo->Queue);
----------------
This function goes too far. It contains:
1. Create an event;
2. Return the stream;
3. Nullify the queue pointer.
Considering `recordEvent` will be used in many other places, such as D104418, please separate it.


================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:1189
+    }
+    Err = cuEventDestroy(Event);
+    if (Err != CUDA_SUCCESS) {
----------------
ye-luo wrote:
> tianshilei1992 wrote:
> > Event destroy worths a separate function. We could add a new return value such as `OFFLOAD_NOT_DONE` to indicate the event is not fulfilled. It is not a good idea to mix event query and event destroy.
> I'm trying to avoid adding stuff that is not immediately used.
> Similar to  Queue, the manipulation of Event is within the plugin and there are no need of APIs to create/destroy events from outside.
> 
> recordEvent is responsible to create and record an event.
> queryEvent is responsbile to query and destroy an event upon completion.
> 
> ```
> #define OFFLOAD_SUCCESS (0)
> #define OFFLOAD_FAIL (~0)
> ```
> This is what I found, not even enum. I don't see a clean way to extend OFFLOAD_NOT_DONE
> I think fixing the return style is not in the scope of this patch. Some design is needed for return values between plugin and device class and between device to omptarget.
> 
> In this case,
> OFFLOAD_FAIL is for errors reported by CUDA runtime. Event point to signal it is completed or still on going.
> This is what I found, not even enum. I don't see a clean way to extend OFFLOAD_NOT_DONE
It is because these values are used in both `libomptarget` (C++ API) and plugins (C API).

> I think fixing the return style is not in the scope of this patch. Some design is needed for return values between plugin and device class and between device to omptarget.
I don't doubt that but it's not good to "twist" the code to fit existing code if there is apparently a better way to do it. If one part needs to be extended to support new features, just do it in another patch and make this one depend on it.


================
Comment at: openmp/libomptarget/src/device.cpp:552
+  if (RTL->record_event) {
+    AsyncInfo.setEventSupported(true);
+    return RTL->record_event(RTLDeviceID, AsyncInfo);
----------------
Whether the event is supported is per-device, so no need to put one indicator it in every async info.


================
Comment at: openmp/libomptarget/src/omptarget.cpp:51
+          Ret = Device.queryEvent(*this);
+        } while (Ret == OFFLOAD_SUCCESS && AsyncInfo.Event);
+
----------------
ye-luo wrote:
> tianshilei1992 wrote:
> > I'm thinking we can actually do more here. For example, set a count for every task yield. When the count reaches a threshold, fall back to stream synchronize. The threshold can be configured via env and so on.
> This looks like an optimization which should be explored separately. I think I may use cuEventSynchronize.
If `cuEventSynchronize` is better than stream one (e.g. the synchronization is no longer just spinning but something similar to signal), it's worth to separate the patch with something like:
```
// launch kernel
// create event
// synchronize
```
And in CUDA plugin, the synchronize is event synchronize. Then apply this patch on that.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107656/new/

https://reviews.llvm.org/D107656



More information about the Openmp-commits mailing list