[Openmp-commits] [PATCH] D132005: [OpenMP] Add non-blocking support for target nowait regions

Fri Dec 9 08:41:59 PST 2022

gValarini marked 3 inline comments as done and an inline comment as not done.
gValarini added a comment.

@jdoerfert there are some other comments pending. How should we proceed?

================
Comment at: openmp/libomptarget/src/interface.cpp:331
+      Int64Envar("OMPTARGET_QUERY_COUNT_THRESHOLD", 5),
+      Envar<float>("OMPTARGET_QUERY_COUNT_BACKOFF_FACTOR", 0.5f));
+
----------------
tianshilei1992 wrote:
> jdoerfert wrote:
> > Why is this thread_local? Should it be tied to the async info object with non-blocking tasks?
> Yeah, I'd expect the counter to be tied to a task.
@jdoerfert @tianshilei1992 my idea here is that we should be able to go back and forth between non-blocking and blocking synchronization in a per-thread manner depending on their own state.

 - If the thread is doing too much non-blocking synchronization (similar to spin-wait), we should block-wait at least once to save resources.
 - If the thread has just completed a target region, let it go through other target regions in a non-blocking manner for a while, possibly completing other tasks and enqueueing some more.

This way, we allow the runtime threads to adapt themselves to the target region payloads.

I see two possible problems in placing the counter in a per-task manner:

 # We can only go from non-blocking to blocking synchronization. Once the task is completed, the counter is destructed and won´t be used anymore, thus we lose its information for future task execution.
 # If we have many tasks with higher enough counters at the top of a thread's task queue, that same thread will be forced to only block-synchronize them, even though we could have tasks ready for completion at the bottom of the queue.

What do you think about the above points? Do they make sense?

================
Comment at: openmp/libomptarget/src/omptarget.cpp:73
+  SmallVector<PostProcFuncTy> Functions(
+      std::move(PostProcessingFunctions));
+
----------------
jdoerfert wrote:
> gValarini wrote:
> > jdoerfert wrote:
> > > gValarini wrote:
> > > > jdoerfert wrote:
> > > > > This is not a good idea, I think. The state of PostProcessingFunctions is undefined afterwards, even if this works in practice. Simply iterate PostProcessingFunctions and then clear it.
> > > > Uhm, really? When moving a `SmallVector` like this, wouldn't `PostProcessingFunctions` be emptied and all the values moved to `Functions`?
> > > The standard says PostPRocessingFunctions is in an unspecified state after the move. If we look at SmallVector we know the state but I don't understand why we need to rely on implementation defined behavior here at all. We don't save any lines and also no work by just iterating PostProcessingFunctions and then clearing it, no?
> > I was thinking about a future use case: if a post-processing function generates more asynchronous device-side operations. In this case, it may want to add a new post-processing function into the vector, but we cannot do that while iterating over it. The [[ https://en.cppreference.com/w/cpp/container/vector/push_back | spec ]] says that, if a push/emplace back resizes the vector, any previous iterator is invalid, which would make the loop invalid. I think that is the case for `SmallVector`s as well.
> > 
> > Right now, the three post-processing functions do not do that. They all do synchronous device operations. But if in the future, someone adds a post-processing function that does that, it cannot be blindly done without taking care of this situation.
> > 
> > Do you think we could leave this "unimplemented" right now? If so, I can do the iteration-then-clear approach.
> > 
> > Just a "side question": which standard does `SmallVector` follows? I am asking that because the [[ https://en.cppreference.com/w/cpp/container/vector/vector | STL ]] says that a vector is guaranteed to be empty after a move. If that is the case for `SmallVectors`, thus `PostProcessingFunctions` would be in a valid state, no?
> If you are worried about inserting, use
> ```for (int i = 0; i < C.size(); ++i)```
> 
> Anyhow, let's keep it this way for now but add an explicit clear at the end. Just to be sure (and explicit).
Uhm, yep, making it explicit is better. What do you think?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132005/new/

https://reviews.llvm.org/D132005