[PATCH] D29381: [PM/LCG] Remove the lazy RefSCC formation from the LazyCallGraph during iteration.

Mon Feb 6 01:20:11 PST 2017

chandlerc marked 8 inline comments as done.
chandlerc added a comment.

Thanks, comments hopefully addressed and new patch incoming.

================
Comment at: include/llvm/Analysis/LazyCallGraph.h:1086-1090
+  template <typename RootsT, typename GetBeginT, typename GetEndT,
+            typename GetNodeT, typename FormSCCCallbackT>
+  static void buildGenericSCCs(RootsT &&Roots, GetBeginT &&GetBegin,
+                               GetEndT &&GetEnd, GetNodeT &&GetNode,
+                               FormSCCCallbackT &&FormSCC);
----------------
davide wrote:
> I think this API deserves a comment, also for consistency with everything else in the file.
Sorry, I had meant to go back through all these comments after I got this stuff working and add the ones missing and clean up the ones that were no longer accurate. Only some got done. I've commented more effectively now (i hope) but please tell me if they could be improved further.

================
Comment at: lib/Analysis/LazyCallGraph.cpp:1690
           // node onto the stack.
           DFSStack.push_back({N, I});

----------------
davide wrote:
> dberlin wrote:
> > davide wrote:
> > > dberlin wrote:
> > > > Just to note, you don't actually have to push the stack here.
> > > > 
> > > > This will push every node onto the stack.
> > > > it's possible to push only the root nodes onto the stack.
> > > > 
> > > > This is more memory efficient, and about 25% faster.
> > > > 
> > > > This is  nuutila's variant, which is easy to implement:
> > > > 
> > > > http://www.cse.tkk.fi/~enu/ps/ipl-scc.ps
> > > > or
> > > > http://www.cse.tkk.fi/~enu/ps/tko-b94-scc.ps
> > > > or 
> > > > the tarjan scc part of https://reviews.llvm.org/D28934
> > > > 
> > > > (For the papers, you want algorithm 1, not 2)
> > > > 
> > > My understanding of Nuutila is that it's faster if the graph is sparse, and when I spoke and proposed this to Chandler he mentioned the graph maybe not-that-sparse (for some definition of). I never found the SCC computation being a bottleneck (even for large programs), but I like Nuutila's variant so I wouldn't be opposed to that (and I think it's worth a try)
> > It's either faster or the same in every case, and given it's switching like two lines, IMHO, i'd do it, but ¯\_(ツ)_/¯
> > 
> > The real speed is unrelated to whether it's a sparse or not. What it's really related to is what happens on trees/dags.
> > 
> > Tarjan's algorithm pushes always
> > Nuutila's variant only pushes if it's actually a cycle.
> > So nodes that are parts of trees/dags, will not end up on the stack.
> > 
> STGM =)
FWIW, if the Nuutila variant can be implement cleanly and shows a performance improvement for some graphs (both of which I assume are true) then of course we should switch. =]

That said, I don't want to switch in *this* patch because my intent here is to very minimally relocate existing code.

I've added this as a FIXME to the missing comment on the routine though. =]

================
Comment at: lib/Analysis/LazyCallGraph.cpp:1692-1693

-          assert(!lookupSCC(ChildN) &&
-                 "Found a node with 0 DFS number but already in an SCC!");
+          //assert(!lookupSCC(ChildN) &&
+          //       "Found a node with 0 DFS number but already in an SCC!");
           ChildN.DFSNumber = ChildN.LowLink = NextDFSNumber++;
----------------
davide wrote:
> Commented code?
An assert that I wasn't sure could be implemented generically. Indeed, it can't. That said, we have lots of other asserts and tests that should catch any issue like this, I just forgot to clean it up. Thanks for spotting1

================
Comment at: lib/Analysis/LazyCallGraph.cpp:1797-1798
+
+  std::reverse(Roots.begin(), Roots.end());
+
+  buildGenericSCCs(
----------------
davide wrote:
> Comment on why you need this reverse, maybe?
I've commented. I'm a bit torn. Not sure we should do reverse at all really, but if I don't do reverse, all of the iterations in the tests go in the opposite order and it seemed a lot of churn for no benefit. I think the reason is that we operate on this as a stack and so reversed gives the most unsurprising semantics.

That said, if you disagree I can remove this completely and just update the tests.

https://reviews.llvm.org/D29381