[Openmp-commits] [PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

Andrey Churbanov via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Tue May 18 15:00:47 PDT 2021


AndreyChurbanov added inline comments.


================
Comment at: openmp/runtime/src/kmp_taskdeps.cpp:344-346
+        // link node as successor of all nodes in the prev_set if any
+        npredecessors +=
+            __kmp_depnode_link_successor(gtid, thread, task, node, prev_set);
----------------
protze.joachim wrote:
> AndreyChurbanov wrote:
> > protze.joachim wrote:
> > > If I understand this code right, this has O(n^2) complexity for two sets of size n?
> > > 
> > > Consider:
> > > ```
> > > for (int i=0; i<n; i++) {
> > > #pragma omp task depend(in:A)
> > > work(A,i);
> > > }
> > > for (int i=0; i<n; i++) {
> > > #pragma omp task depend(inoutset:A)
> > > work(A,i);
> > > }
> > > ```
> > > All n tasks in the first loop would be predecessor for each of the tasks in the second loop. This will also result in n^2 releases/notifications.
> > > 
> > > 
> > > I'd suggest to model the dependencies like:
> > > ```
> > > for (int i=0; i<n; i++) {
> > > #pragma omp task depend(in:A)
> > > work(A,i);
> > > }
> > > #pragma omp taskwait depend(inout:A) nowait
> > > for (int i=0; i<n; i++) {
> > > #pragma omp task depend(inoutset:A)
> > > work(A,i);
> > > }
> > > ```
> > > I.e., add a dummy dependency node between the two sets, which has n predecessors and n successors. You probably understand the depnode code better, than I do, but I think you would only need to add some code in line 357, where `last_set` is moved to `prev_set`.
> > > This dummy node would reduce linking and releasing complexity to O(n).
> > This could be a separate research project.  Because such change may cause performance regressions on some real codes, so it needs thorough investigation.  I mean insertion of an auxiliary dummy node into dependency graph is definitely an overhead, which may or may not be overridden by reduction in number of edges in the graph.  Or it can be made optional optimization under an external control, if there is a significant gain in some particular case.
> I don't suggest to insert the auxiliary node in the general case. Just for the case that you see a transition of `in` -> `inoutset` or vice versa.
> 
> With current task dependencies, you always have 1 node notifying n nodes (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see this as an amortized O(1) operation per task in the graph.
> 
> Here you introduce a code pattern, where n nodes each will need to notify m other nodes. This leads to an O(n) operation per task. I'm really worried about the complexity of this pattern.
> If there is no use case with large n, I don't understand, why `inoutset` was introduced in the first place.
> I don't suggest to insert the auxiliary node in the general case. Just for the case that you see a transition of `in` -> `inoutset` or vice versa.
> 
> With current task dependencies, you always have 1 node notifying n nodes (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see this as an amortized O(1) operation per task in the graph.
> 
> Here you introduce a code pattern, where n nodes each will need to notify m other nodes. This leads to an O(n) operation per task. I'm really worried about the complexity of this pattern.
No, I don't introduce the pattern, as it already worked for sets of tasks with `in` dependency following set of tasks with `mutexinoutset` dependency or vice versa.

> If there is no use case with large n, I don't understand, why `inoutset` was introduced in the first place.
I didn't like that `inoutset` was introduced as the clone of "in" in order to separate two (big) sets of mutually independent tasks, but this has already been added to specification, so we have to implement it. Of cause your suggestion can dramatically speed up dependency processing of some trivial case with two huge sets of tasks one wit `in` dependency another with `inoutset`; but it slightly changes the semantics of user's code, and actually can slowdown other cases.  I would let users do such optimizations.  Or compiler can do this once it sees big sets of tasks those don't have barrier-like "taskwait nowait".  For user this is one line of code, for compiler this is dozen lines of code, for runtime library this is pretty big change which does not worth efforts to implement, I think.  Especially given that such "optimization" will definitely slowdown some cases.  E.g. small set of tasks with `in` dependency can be followed single task with `inoutset` in a loop. Why should we slowdown this case?  Library has no idea of the structure of user's code.

In general, I don't like the idea when runtime library tries to optimize user's code.  Especially along with other changes in the same patch.



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085



More information about the Openmp-commits mailing list