[Openmp-commits] [PATCH] D132005: [OpenMP] Add non-blocking support for target nowait regions

Guilherme Valarini via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Oct 17 12:46:35 PDT 2022


gValarini marked 13 inline comments as done.
gValarini added inline comments.


================
Comment at: openmp/libomptarget/include/device.h:441
+  /// succeeds/fails. Must be called multiple times until AsyncInfo is
+  /// completed and AsyncInfo.isDone() returns true.
+  int32_t queryAsync(AsyncInfoTy &AsyncInfo);
----------------
jdoerfert wrote:
> Reading this I don't what this returns. SUCCESS if it completed and FAIL otherwise? Or FAIL only if something failed? Must be called multiple times is also unclear to me, I doubt that we should put that sentence here.
> 
> Update: I think I now understand the latter part but if I'm right we should change the interface.
> So, queryAsync is supposed to be called before isDone to make sure isDone returns the right value, correct? If so, we should not expose queryAsync to the user as there doesn't seem to be a reason to call it otherwise. Arguably, calling it doesn't provide information, just a state change, thus a secondary query is necessary.
I should probably update the documentation of this function to reflect the new one added to `omptargetplugin.h:__tgt_rtl_query_async`. That state the following:

> Queries for the completion of asynchronous operations. Instead of blocking the calling thread as __tgt_rtl_synchronize, the progress of the operations stored in AsyncInfo->Queue is queried in a non-blocking manner, partially advancing their execution. If all operations are completed, AsyncInfo->Queue is set to nullptr. If there are still pending operations, AsyncInfo->Queue is kept as a valid queue. In any case of success (i.e., successful query with/without completing all operations), return zero. Otherwise, return an error code.
> 

Thus, `queryAsync` (which calls `__tgt_rtl_query_async`), is a non-blocking version of `synchronize`. That means that we must call it multiple times until all operations are completed and the plugin invalidates the queue inside `AsyncInfo`. Here, `AsyncInfoTy::isDone` is just a helper function that indicates if the device side operations are completed or not based on said queue. We need to externalize `queryAsync` to its user `AsyncInfoTy` so it can call the non-blocking implementation.

Considering your comment, what do you think of making things more explicit by adding a flag pointer argument to `queryAsync` (and thus to __tgt_rtl_query_async) that returns true if all operations are completed and false otherwise?

```
  int32_t queryAsync(AsyncInfoTy &AsyncInfo, bool &IsCompleted);
```


================
Comment at: openmp/libomptarget/include/omptarget.h:217
   /// \returns OFFLOAD_FAIL or OFFLOAD_SUCCESS appropriately.
-  int synchronize();
+  int synchronize(SyncType SyncType = SyncType::BLOCKING);
 
----------------
jdoerfert wrote:
> Nit: Rename argument to avoid shadowing.
> 
> Make the version that takes the sync type, and probably the sync type, private. IsDone can call the private version, users only the blocking one.
Thanks, I am renaming the type name to `SyncTypeTy` to reflect the other ones.

Regarding the second comment, I don´t quite understand what you mean with:

> IsDone can call the private version, users only the blocking one.

`isDone` only checks if the operations inside an `AsyncInfoTy` instance are completed or not, it does not call any plugin function at all. Are you suggesting that we move all the non-blocking synchronization code into `isDone`? If so, this means we would have some code duplication regarding the post-processing functions due to two separate synchronization paths, but if you think that is better I can do it.



================
Comment at: openmp/libomptarget/include/omptarget.h:217
   /// \returns OFFLOAD_FAIL or OFFLOAD_SUCCESS appropriately.
-  int synchronize();
+  int synchronize(SyncType SyncType = SyncType::BLOCKING);
 
----------------
gValarini wrote:
> jdoerfert wrote:
> > Nit: Rename argument to avoid shadowing.
> > 
> > Make the version that takes the sync type, and probably the sync type, private. IsDone can call the private version, users only the blocking one.
> Thanks, I am renaming the type name to `SyncTypeTy` to reflect the other ones.
> 
> Regarding the second comment, I don´t quite understand what you mean with:
> 
> > IsDone can call the private version, users only the blocking one.
> 
> `isDone` only checks if the operations inside an `AsyncInfoTy` instance are completed or not, it does not call any plugin function at all. Are you suggesting that we move all the non-blocking synchronization code into `isDone`? If so, this means we would have some code duplication regarding the post-processing functions due to two separate synchronization paths, but if you think that is better I can do it.
> 
> Nit: Rename argument to avoid shadowing.
> 
> Make the version that takes the sync type, and probably the sync type, private. IsDone can call the private version, users only the blocking one.




================
Comment at: openmp/libomptarget/include/omptargetplugin.h:160
+// asynchronous operations, marking the internal queue as nullptr when
+// completed.
+int32_t __tgt_rtl_synchronize_async(int32_t ID, __tgt_async_info *AsyncInfo);
----------------
gValarini wrote:
> jdoerfert wrote:
> > Describe the return value, it's not clear what would be returned if it does or doesn't synchronize.
> Good point. I have updated both the documentation and the function name to better reflect what this new interface should do. Do you think it is more clear now?
@jdoerfert any comments on the new function and its doc?


================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:1272
+      // Not ready streams must be considered as successful operations.
+      Err = CUDA_SUCCESS;
+    } else {
----------------
jdoerfert wrote:
> return OFFLOAD_SUCCESS;
> 
> will reduce indention and logic later on.
Perfect, done!


================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:1278
+      StreamPool[DeviceId]->release(
+          reinterpret_cast<CUstream>(AsyncInfo->Queue));
+      AsyncInfo->Queue = nullptr;
----------------
jdoerfert wrote:
> 
Thanks, done!


================
Comment at: openmp/libomptarget/src/interface.cpp:86
+  int Rc = OFFLOAD_SUCCESS;
+  if (Dispatch)
+    Rc = TargetDataFunction(Loc, Device, ArgNum, ArgsBase, Args, ArgSizes,
----------------
jdoerfert wrote:
> This is different but similar to the condition used in the other new "helper" below. I have the same concerns as there. When would we ever not call the target data function?
That was a code error. The target helper should also use the `Dispatch` variable as well. Thanks for noticing.


================
Comment at: openmp/libomptarget/src/interface.cpp:86
+  int Rc = OFFLOAD_SUCCESS;
+  if (Dispatch)
+    Rc = TargetDataFunction(Loc, Device, ArgNum, ArgsBase, Args, ArgSizes,
----------------
gValarini wrote:
> jdoerfert wrote:
> > This is different but similar to the condition used in the other new "helper" below. I have the same concerns as there. When would we ever not call the target data function?
> That was a code error. The target helper should also use the `Dispatch` variable as well. Thanks for noticing.
With the [[ https://discourse.llvm.org/t/rfc-re-scheduling-implicit-tasks-for-non-blocking-target-nowait-execution/64239 | RFC ]] implemented, we are now re-enqueuing the same task multiple times until all the device side operations are completed. Because of that, we may call the `__tgt_target_`* functions multiple times as well. Since we want to dispatch the operations only once, we call the target data functions only when the target task is first encountered. The next calls will only synchronize the operations instead of dispatching them again, that's why `AsyncInfo.synchronize` is always called right below it.


================
Comment at: openmp/libomptarget/src/interface.cpp:95
+  handleTargetOutcome(Rc == OFFLOAD_SUCCESS, Loc);
+}
+
----------------
jdoerfert wrote:
> Is `FromMapper` ever set to true? Did I miss that?
Nope, it is not. I am removing it from the arguments and always passing `false`.


================
Comment at: openmp/libomptarget/src/interface.cpp:252
+             AsyncInfoTy::SyncType SyncType = AsyncInfoTy::SyncType::BLOCKING,
+             bool Dispatch = true) {
   TIMESCOPE_WITH_IDENT(Loc);
----------------
jdoerfert wrote:
> There is more duplication in the callees to be moved here, no?
> The two last arguments could be omitted and grabbed from AsyncInfo, also in the above rewrite.
> Dispatch is not used?
Yep, that was an error on my part, `Dispatch` should be used instead of `AsyncInfo.isDone()`.

Regarding the other arguments, they are obtained from the wrapper `TaskAsyncInfoTy`, not from `AsyncInfoTy`. I can change that to unify the wrappers code and `AsyncInfoTy`, but I'll be probably putting too different responsibilities into the `AsyncInfoTy` struct. What do you think?


================
Comment at: openmp/libomptarget/src/interface.cpp:279
+  int Rc = OFFLOAD_SUCCESS;
+  if (AsyncInfo.isDone())
+    Rc = target(Loc, Device, HostPtr, Args->NumArgs, Args->ArgBasePtrs,
----------------
jdoerfert wrote:
> This I don't understand. Why do we have to wait to enqueue the kernel? And even if, how does this not accidentally skip the target region and we will never execute it at all? Long story short, I doubt the conditional here makes sense.
You are right, this was a leftover prior to the refactoring of the interface file. Although it worked, it did only because the queue pointer was null and `isDone` would return `true` at first. Replaced it with the `Dispatch` variable.


================
Comment at: openmp/libomptarget/src/omptarget.cpp:41-50
+
+  // Run any pending post-processing function registered on this async object.
+  if (Result == OFFLOAD_SUCCESS && isDone()) {
+    for (auto &PostProcFunc : PostProcessingFunctions) {
+      Result = PostProcFunc();
+      if (Result != OFFLOAD_SUCCESS)
+        break;
----------------
jdoerfert wrote:
> This would move to isDone below.
Ok, I can do it, no problem. But just to make sure I got it right with respect to the other comments, you are suggesting this so we would can `synchronize` for the blocking synchronization and `isDone` for the non-blocking one, correct? If that is so, just remember that for the current code, the PostProcessingFunctions must be called on both cases, so isDone would need to be called when blocking synchronizing as well.


================
Comment at: openmp/libomptarget/src/private.h:227
+
+    delete AsyncInfo;
+    __kmpc_omp_set_target_async_handle(ExecThreadID, NULL);
----------------
jdoerfert wrote:
> Should this be guarded by IsNew?
Nope, `IsNew` indicates that a new task-attached `AsyncInfo` has just been allocated, but not that we should deallocate it or not. The variable is primarily used to indicate that we must dispatch new operations to the new handle. Maybe I should rename it to just `ShouldDispatch`. Deallocation is always done when `AsyncInfo->isDone()` returns `true`, which is previously checked.


================
Comment at: openmp/libomptarget/src/private.h:240
+  /// synchronization is needed.
+  bool shouldDispatch() { return !IsNew; }
+};
----------------
jdoerfert wrote:
> I don't understand what we want/need this dispatch idea for. It seems to skip operations but I don't understand how we would not forget about them and go back. 
Here is the main idea: 

# The first time a target nowait (target inside an OpenMP task) is executed, a new `AsyncInfo` handle is created and stored in the OpenMP taskdata structured of said task. Since this is the first time we are executing it (which is detected by the `IsNew` variable), we dispatch the device side operations and populate the post-processing function array by calling the proper `omptarget.cpp` functions (e.g., `targetDataBegin`, `targetDataEnd`, ...).
# Afterward, if the device operations are still pending, the OpenMP task is re-enqueued for execution.
# Later, when the task is re-executed, the same outline function called at step 1 will be called again. Here we can recover the `AsyncInfo` handle from the OpenMP taskdata and just synchronize it. Since this time the handle is not new, we know the operations were already dispatched previously and we should not dispatch them again.

I have a [[ https://docs.google.com/presentation/d/1yFaIanlR4JccYFhrKe_30k17LgjvyvC6DkOgc1Hum5c/edit?usp=sharing | presentation]] that explains it on slides 19-24, but I believe I am failing to describe that in the code. I'll try to come up with some better documentation for this dispatch/synchronize idea.


================
Comment at: openmp/runtime/src/kmp_tasking.cpp:5157
+  void *handle = NULL;
+  kmp_info_t *thread = __kmp_thread_from_gtid(gtid);
+  kmp_taskdata_t *taskdata = thread->th.th_current_task;
----------------
gValarini wrote:
> tianshilei1992 wrote:
> > `libomptarget` (for now) doesn't require the thread must be an OpenMP thread. Using `libomp`'s gtid generally breaks. Either we add the requirement, which needs to be discussed further, or there it needs an alternative method to implement that. If `libomp` is executed on another fresh thread, a new root will be created.
> Uhm, I did not know about that. Although I think such a requirement makes sense, it may be out of the scope of this patch. 
> 
> What we could do is check if the current thread is registered inside `libomp` somehow, falling back to the current execution path that does not depend on the task team information. Do you know we can use `__kmpc_global_thread_num`'s return value to verify that? Maybe assert that returned `GTID` is valid and within a well-known range (e.g., [0, NUM_REGISTERED_OMP_THREADS]).
> 
> Just a note, NUM_REGISTERED_OMP_THREADS is not a valid variable. I just don't know where, or even if, such information is stored. Do you know where can I find this?
@tianshilei1992 any comments on this?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132005/new/

https://reviews.llvm.org/D132005



More information about the Openmp-commits mailing list