[PATCH] D72350: [DDG] Data Dependence Graph - Graph Simplification

Fri Jan 10 13:29:28 PST 2020

bmahjour marked 3 inline comments as done.
bmahjour added inline comments.

================
Comment at: llvm/lib/Analysis/DependenceGraphBuilder.cpp:426
+      for (NodeType *N : Graph)
+        if (N->hasEdgeTo(EC.first->back().getTargetNode()))
+          ++EC.second;
----------------
Meinersbur wrote:
> bmahjour wrote:
> > Meinersbur wrote:
> > > [serious] This is an O(V^2*max(E)) worst case algorithm (`hasEdgeTo` is a linear search) reinvoked over each fixpoint round, i.e. potentially quadratic (maybe less because of the `EC.second > 1` bailout and fixpoint probably not linear over node count). To count the number of incoming edges, iterate over all edges, lookup the target node in `CandidateSrcNodes` and increase its counter. That's just O(E). To further reduce computation time, `CandidateSrcNodes` could be updated when nodes are merged instead of recomputing it.
> > The DirectedGraph implementation does not hold a list of all edges (to save space), so we first need to compute it. Note that in the worst case (where the graph is complete) walking over all edges has O(V^2) complexity, so the whole algorithm will have O(V^2 + V) complexity. 
> > 
> > Note also that the CandidateSrcNodes map is a map from the candidate source nodes (not the target nodes), so we would need another map to store the incoming edge count of all nodes and then cross reference it as we walk CandidateSrcNodes and update it. 
> > 
> > We would need to introduce a map and a list with worst-case space complexities O(V) and O(V^2) respectively. Having said that in the case of complete graphs, we have `max(E) == V` so in comparison with the above algorithm, it would still be an improvement in terms of computation.
> > 
> > If you still think the additional memory complexity increase is justified, I can change it as described.
> I'd expect a typical dependence graph to be a sequence of use-def chains: a->b->c->d->e->f->... . This would already have O(V) for your algorithm as `CandidateSrcNodes` contains all nodes except the last. Compete graphs should be collapsed to a sing Pi node and we may consider removing transitive dependencies such that these chains are more likely.
> 
> Additional O(V) space allocations (A DenseMap<NodeType>) should be less important since already storing the nodes takes O(V) space. I don't understand where the O(V^2) comes from.
> 
> I think of the following:
> ```
> DenseMap<NodeType> NumberOfIncomingEdges;
> for (NodeType *N : Graph)
>   for (EdgeType *E: N)
>     NumberOfIncomingEdges[E->getTargetNode()] += 1;
> ```
> and the requirement then be checked on the fly:
> ```
> for (NodeType *Src : Graph) {
>   if (Src->getEdges().size() != 1)
>     continue;
>   EdgeType *E = Src.back();
>   if (!E->isUseDef())
>     continue;
>   NodeType *Tgt = E->getTargetNode();
>   if (NumberOfIncomingEdges[Tgt] != 1)
>     continue;
> ```
> First part is O(V+E) (or just E assuming that E>V), second is O(V).
> 
> We could even compute CandidateSrcNodes` using `NumberOfIncomingEdges` with fewer operations:
> ```
> for (NodeType *N : Graph) {
>   if (N->getEdges().size() != 1)
>     continue;
>   for (EdgeType *E: N)
>     CandidateSrcNodes[N] += NumberOfIncomingEdges[E->getTargetNode()].
> }
> ```
> with O(V+E).
> 
> But also also think you put more thoughts into it than I just did, so what am I missing?
> Compete graphs should be collapsed to a sing Pi node and we may consider removing transitive dependencies such that these chains are more likely.
In this patch simplify() is done before pi-block creation, so that the SCC walks can benefit from the reduction of node/edges as a result of simplification.

> I don't understand where the O(V^2) comes from
The number of edges in a complete graph is equal to V(V-1)/2, which leads to the quadratic complexity in number of nodes if we walk over it.

================
Comment at: llvm/lib/Analysis/DependenceGraphBuilder.cpp:455
+    ++TotalSimplificationRounds;
+  } while (DidSomething);
+  LLVM_DEBUG(dbgs() << "=== End of Graph Simplification ===\n");
----------------
Meinersbur wrote:
> bmahjour wrote:
> > Meinersbur wrote:
> > > I wonder whether a fixpoint is even necessary. In what cases would we need a second round?
> > It's meant to catch cases like this:
> > 
> > `{(a)->(b), (b)->(c), (c)->(d), (e)->(d)}`
> > 
> > the goal is to turn it into:
> > 
> > `{(a,b,c)->(d), (e)->(d)}`
> > 
> > I think I can get rid of the fixpoint algorithm by making use of the CandidateSrcNodes as a worklist. I'll post an update soon.
> In this example a->b and b->c can both be merged in the same round and with a bit of care, the merge of one of the would not inhibit merging the other in the same round.
Please take a look at the updated patch, where I removed the fix point iteration. Does that look good to you?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72350/new/

https://reviews.llvm.org/D72350