[PATCH] D24219: [LCG] Redesign the lazy post-order iteration mechanism for the LazyCallGraph to support repeated, stable iterations, even in the face of graph updates.

Tue Sep 13 22:39:11 PDT 2016

sanjoy accepted this revision.
sanjoy added a comment.
This revision is now accepted and ready to land.

lgtm modulo minor things here and there


================
Comment at: include/llvm/Analysis/LazyCallGraph.h:975
@@ +974,3 @@
+    assert(IndexIt != RefSCCIndices.end() && "RefSCC doesn't have an index!");
+    return IndexIt->second;
+  }
----------------
Can we add an `assert(PostOrderRefSCCs[IndexIt->second] == &RC)` here?

================
Comment at: lib/Analysis/LazyCallGraph.cpp:827
@@ +826,3 @@
+  // working backwards from the source using the parent set in each RefSCC,
+  // skipping any RefSCCs that don't fall in the postorder range. This has the
+  // advantage of walking the sparser parent edge (in high fan-out graphs) but
----------------
> But the simplification I have in mind will impact both this code and the SCC caller, so I'd rather do that in a follow-up if its OK.

SGTM.

================
Comment at: lib/Analysis/LazyCallGraph.cpp:852-876
@@ -869,9 +851,27 @@
 
-      // We fully explore the depth-first space, adding nodes to the connected
-      // set only as we pop them off, so "recurse" by rotating to the parent.
-      DFSStack.push_back({C, I});
-      C = &Parent;
-      I = C->parent_begin();
-      E = C->parent_end();
-    }
+  // Use a normal worklist to find which SCCs the target connects to. We still
+  // bound the search based on the range in the postorder list we care about,
+  // but because this is forward connectivity we just "recurse" through the
+  // edges.
+  auto ComputeTargetConnectedSet = [&](SmallPtrSetImpl<RefSCC *> &Set) {
+    Set.insert(this);
+    SmallVector<RefSCC *, 4> Worklist;
+    Worklist.push_back(this);
+    do {
+      RefSCC &RC = *Worklist.pop_back_val();
+      for (SCC &C : RC)
+        for (Node &N : C)
+          for (Edge &E : N) {
+            assert(E.getNode() && "Must have formed a node!");
+            RefSCC &EdgeRC = *G->lookupRefSCC(*E.getNode());
+            if (G->getRefSCCIndex(EdgeRC) <= SourceIdx)
+              // Not in the postorder sequence between source and target.
+              continue;
+
+            if (Set.insert(&EdgeRC).second)
+              Worklist.push_back(&EdgeRC);
+          }
+    } while (!Worklist.empty());
+  };
 
+  // Use a generic helper to update the postorder sequence of RefSCCs and return
----------------
Okay, I agree that `iterator_range<SmallVectorImpl<RefSCC *>::iterator>` isn't great. :)

However, I had made my suggestion under the assumption that `updatePostorderSequenceForEdgeInsertion` returns an `ArrayRef<RefSCC *>`.  Does changing `updatePostorderSequenceForEdgeInsertion` to return `ArrayRef<RefSCC *>` sound generally cleaner?

In any case, there's no reason to hold checking this on this specific point, it's very minor.

================
Comment at: lib/Analysis/LazyCallGraph.cpp:895
@@ -904,2 +894,3 @@
+    assert(RC != this &&
            "This RefSCC should terminate the DFS without being reached.");
 
----------------
Not clear what `"This RefSCC should terminate the DFS without being reached."` means now.

================
Comment at: lib/Analysis/LazyCallGraph.cpp:900
@@ +899,3 @@
+      if (!MergeSet.count(ParentRC))
+        Parents.insert(ParentRC);
+    RC->Parents.clear();
----------------
Ok.

================
Comment at: lib/Analysis/LazyCallGraph.cpp:1222
@@ +1221,3 @@
+  // Insert the resulting postorder sequence into the global graph postorder
+  // sequence before the current RefSCC in that sequnece. The idea being that
+  // this RefSCC is the target of the reference edge removed, and thus has
----------------
There's one more ("sequnece").

================
Comment at: unittests/Analysis/LazyCallGraphTest.cpp:142
@@ +141,3 @@
+     "define void @a1() {\n"
+     "entry:\n"
+     "  %a = alloca void ()*\n"
----------------
Are the trailing `\n` 's needed?  Seems like we should be able to replace them with spaces (which may look less distracting).

================
Comment at: unittests/Analysis/LazyCallGraphTest.cpp:581
@@ -480,3 +580,3 @@
   for (LazyCallGraph::RefSCC &RC : CG.postorder_ref_sccs())
-    (void)RC;
+    dbgs() << "Formed RefSCC: " << RC << "\n";
 
----------------
Writing to `dbgs()` seems fairly uncommon in `unittests/` (I only see one other instance).  I 
can't see any obvious reason why this will be a problem, but perhaps there is a reason why it is usually avoided?


================
Comment at: unittests/Analysis/LazyCallGraphTest.cpp:895
@@ +894,3 @@
+  LazyCallGraph::Node &B1 = *CG.lookup(lookupFunction(*M, "b1"));
+  LazyCallGraph::Node &B2 = *CG.lookup(lookupFunction(*M, "b2"));
+  LazyCallGraph::Node &B3 = *CG.lookup(lookupFunction(*M, "b3"));
----------------
Can this sequence be a C macro or a C++ class with `A1`, `A2` etc. as fields (in the latter case all of the lookup logic could be in a constructor or some helper function)?  Right now this is repeated a few times.


================
Comment at: unittests/Analysis/LazyCallGraphTest.cpp:1018
@@ +1017,3 @@
+  EXPECT_EQ(3u, MergedRCs.size());
+  EXPECT_NE(std::find(MergedRCs.begin(), MergedRCs.end(), &BRC), MergedRCs.end());
+  EXPECT_NE(std::find(MergedRCs.begin(), MergedRCs.end(), &CRC), MergedRCs.end());
----------------
`llvm::find` (here and below)?

================
Comment at: unittests/Analysis/LazyCallGraphTest.cpp:1082
@@ +1081,3 @@
+
+  // Connect the top to the bottom forming a large RefSCC made up mostly of calls.
+  auto MergedRCs = ARC.insertIncomingRefEdge(D, A);
----------------
@eraman 's comment has not been addressed.


https://reviews.llvm.org/D24219