[Openmp-commits] [PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

Tue May 18 04:52:09 PDT 2021

protze.joachim added inline comments.

================
Comment at: openmp/runtime/src/kmp_taskdeps.cpp:344-346
+        // link node as successor of all nodes in the prev_set if any
+        npredecessors +=
+            __kmp_depnode_link_successor(gtid, thread, task, node, prev_set);
----------------
AndreyChurbanov wrote:
> protze.joachim wrote:
> > If I understand this code right, this has O(n^2) complexity for two sets of size n?
> > 
> > Consider:
> > ```
> > for (int i=0; i<n; i++) {
> > #pragma omp task depend(in:A)
> > work(A,i);
> > }
> > for (int i=0; i<n; i++) {
> > #pragma omp task depend(inoutset:A)
> > work(A,i);
> > }
> > ```
> > All n tasks in the first loop would be predecessor for each of the tasks in the second loop. This will also result in n^2 releases/notifications.
> > 
> > 
> > I'd suggest to model the dependencies like:
> > ```
> > for (int i=0; i<n; i++) {
> > #pragma omp task depend(in:A)
> > work(A,i);
> > }
> > #pragma omp taskwait depend(inout:A) nowait
> > for (int i=0; i<n; i++) {
> > #pragma omp task depend(inoutset:A)
> > work(A,i);
> > }
> > ```
> > I.e., add a dummy dependency node between the two sets, which has n predecessors and n successors. You probably understand the depnode code better, than I do, but I think you would only need to add some code in line 357, where `last_set` is moved to `prev_set`.
> > This dummy node would reduce linking and releasing complexity to O(n).
> This could be a separate research project.  Because such change may cause performance regressions on some real codes, so it needs thorough investigation.  I mean insertion of an auxiliary dummy node into dependency graph is definitely an overhead, which may or may not be overridden by reduction in number of edges in the graph.  Or it can be made optional optimization under an external control, if there is a significant gain in some particular case.
I don't suggest to insert the auxiliary node in the general case. Just for the case that you see a transition of `in` -> `inoutset` or vice versa.

With current task dependencies, you always have 1 node notifying n nodes (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see this as an amortized O(1) operation per task in the graph.

Here you introduce a code pattern, where n nodes each will need to notify m other nodes. This leads to an O(n) operation per task. I'm really worried about the complexity of this pattern.
If there is no use case with large n, I don't understand, why `inoutset` was introduced in the first place.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085