[Openmp-commits] [PATCH] D132005: [OpenMP] Add non-blocking support for target nowait regions

Fri Dec 9 10:16:38 PST 2022

jdoerfert accepted this revision.
jdoerfert added a comment.
This revision is now accepted and ready to land.

LG

================
Comment at: openmp/libomptarget/src/interface.cpp:331
+      Int64Envar("OMPTARGET_QUERY_COUNT_THRESHOLD", 5),
+      Envar<float>("OMPTARGET_QUERY_COUNT_BACKOFF_FACTOR", 0.5f));
+
----------------
gValarini wrote:
> tianshilei1992 wrote:
> > jdoerfert wrote:
> > > Why is this thread_local? Should it be tied to the async info object with non-blocking tasks?
> > Yeah, I'd expect the counter to be tied to a task.
> @jdoerfert @tianshilei1992 my idea here is that we should be able to go back and forth between non-blocking and blocking synchronization in a per-thread manner depending on their own state.
> 
>  - If the thread is doing too much non-blocking synchronization (similar to spin-wait), we should block-wait at least once to save resources.
>  - If the thread has just completed a target region, let it go through other target regions in a non-blocking manner for a while, possibly completing other tasks and enqueueing some more.
> 
> This way, we allow the runtime threads to adapt themselves to the target region payloads.
> 
> I see two possible problems in placing the counter in a per-task manner:
> 
>  # We can only go from non-blocking to blocking synchronization. Once the task is completed, the counter is destructed and won´t be used anymore, thus we lose its information for future task execution.
>  # If we have many tasks with higher enough counters at the top of a thread's task queue, that same thread will be forced to only block-synchronize them, even though we could have tasks ready for completion at the bottom of the queue.
> 
> What do you think about the above points? Do they make sense?
> 
> 
Hm, it seems reasonable for your scenario. It is unclear what we should optimize for here. I'm OK with keeping it like this as it might be better for a "always blocking tasks" and "consistently mixed task" load. The per-task impl. would only be good if we have "totally independent tasks", I guess.

================
Comment at: openmp/libomptarget/src/omptarget.cpp:73
+  SmallVector<PostProcFuncTy> Functions(
+      std::move(PostProcessingFunctions));
+
----------------
gValarini wrote:
> jdoerfert wrote:
> > gValarini wrote:
> > > jdoerfert wrote:
> > > > gValarini wrote:
> > > > > jdoerfert wrote:
> > > > > > This is not a good idea, I think. The state of PostProcessingFunctions is undefined afterwards, even if this works in practice. Simply iterate PostProcessingFunctions and then clear it.
> > > > > Uhm, really? When moving a `SmallVector` like this, wouldn't `PostProcessingFunctions` be emptied and all the values moved to `Functions`?
> > > > The standard says PostPRocessingFunctions is in an unspecified state after the move. If we look at SmallVector we know the state but I don't understand why we need to rely on implementation defined behavior here at all. We don't save any lines and also no work by just iterating PostProcessingFunctions and then clearing it, no?
> > > I was thinking about a future use case: if a post-processing function generates more asynchronous device-side operations. In this case, it may want to add a new post-processing function into the vector, but we cannot do that while iterating over it. The [[ https://en.cppreference.com/w/cpp/container/vector/push_back | spec ]] says that, if a push/emplace back resizes the vector, any previous iterator is invalid, which would make the loop invalid. I think that is the case for `SmallVector`s as well.
> > > 
> > > Right now, the three post-processing functions do not do that. They all do synchronous device operations. But if in the future, someone adds a post-processing function that does that, it cannot be blindly done without taking care of this situation.
> > > 
> > > Do you think we could leave this "unimplemented" right now? If so, I can do the iteration-then-clear approach.
> > > 
> > > Just a "side question": which standard does `SmallVector` follows? I am asking that because the [[ https://en.cppreference.com/w/cpp/container/vector/vector | STL ]] says that a vector is guaranteed to be empty after a move. If that is the case for `SmallVectors`, thus `PostProcessingFunctions` would be in a valid state, no?
> > If you are worried about inserting, use
> > ```for (int i = 0; i < C.size(); ++i)```
> > 
> > Anyhow, let's keep it this way for now but add an explicit clear at the end. Just to be sure (and explicit).
> Uhm, yep, making it explicit is better. What do you think?
It's fine for now.

================
Comment at: openmp/libomptarget/src/omptarget.cpp:1205
                           const map_var_info_t HstPtrName = nullptr)
-        : Index(Index), HstPtrBegin(reinterpret_cast<const char *>(HstPtr)),
+        : Index(Index), HstPtrBegin(reinterpret_cast<char *>(HstPtr)),
           HstPtrEnd(HstPtrBegin + Size), AlignedSize(Size + Size % Alignment),
----------------
gValarini wrote:
> gValarini wrote:
> > jdoerfert wrote:
> > > Why are the const problematic?
> > In summary, we want to be able to move `PrivateArgumentManagerTy` instances into the post-processing lambdas, so their lifetime is automatically managed by them.
> > 
> > The problem is with how `llvm::SmallVector` implements its move constructor. Unfortunately, it is implemented as a move-assignment instead of a proper constructor, meaning it cannot be generated for structs with const members. If we replace `FirstPrivateArgInfo` with a `std::vector`, the problem does not happen because the STL properly implements a move constructor for vectors.
> > 
> > Since I think we do not want to use `std::vector` anymore, I just removed the const from the members, since they are not even accessible outside the `PrivateArgumentManagerTy` class.
> > 
> > What do you think of this approach?
> @jdoerfert any new comment on this?
It's ok to remove the const.

================
Comment at: openmp/libomptarget/src/omptarget.cpp:1517
+        return Ret;
+      });

----------------
gValarini wrote:
> gValarini wrote:
> > jdoerfert wrote:
> > > FWIW, `mutable` is really not my favorite way of handling things.
> > `mutable` was added because we need to call a non-const member function of `PrivateArgumentManager` (i.e., `free`).
> > 
> > I know that makes the lambda a function with an internal state since, multiple calls to it will generate different results, but I don´t know of another approach to it.
> > 
> > Maybe use `call_once` (IMHO, a little bit overkill) or remove the lambdas altogether and use another approach to store the post-processing functions and their payload. What do you think?
> @jdoerfert any new comment on this?
Add a TODO to look into this in the future.

================
Comment at: openmp/runtime/src/kmp_tasking.cpp:1145
+      KMP_ATOMIC_DEC(&__kmp_unexecuted_hidden_helper_tasks);
+    }
   }
----------------
gValarini wrote:
> gValarini wrote:
> > jdoerfert wrote:
> > > @tianshilei1992 you need to look at these changes.
> > Any comments on whether we can move the `__kmp_unexecuted_hidden_helper_tasks` decrement to this place?
> @tianshilei1992 is this correct?
I think @tianshilei1992 mentioned to me this should be fine.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132005/new/

https://reviews.llvm.org/D132005