[llvm-dev] Intended behavior of CGSCC pass manager.

Wed Jun 8 14:55:26 PDT 2016

----- Original Message -----

> From: "Sean Silva" <chisophugis at gmail.com>
> To: "Mehdi Amini" <mehdi.amini at apple.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev"
> <llvm-dev at lists.llvm.org>
> Sent: Wednesday, June 8, 2016 4:25:09 PM
> Subject: Re: [llvm-dev] Intended behavior of CGSCC pass manager.

> On Wed, Jun 8, 2016 at 11:58 AM, Mehdi Amini < mehdi.amini at apple.com
> > wrote:

> > > On Jun 8, 2016, at 9:32 AM, Hal Finkel via llvm-dev <
> > > llvm-dev at lists.llvm.org > wrote:
> > 
> 

> > > > From: "Sean Silva via llvm-dev" < llvm-dev at lists.llvm.org >
> > > 
> > 
> 
> > > > To: "llvm-dev" < llvm-dev at lists.llvm.org >
> > > 
> > 
> 
> > > > Sent: Wednesday, June 8, 2016 6:19:03 AM
> > > 
> > 
> 
> > > > Subject: [llvm-dev] Intended behavior of CGSCC pass manager.
> > > 
> > 
> 

> > > > Hi Chandler, Philip, Mehdi, (and llvm-dev,)
> > > 
> > 
> 

> > > > (this is partially a summary of some discussions that happened
> > > > at
> > > > the
> > > > last LLVM bay area social, and partially a discussion about the
> > > > direction of the CGSCC pass manager)
> > > 
> > 
> 

> > > > A the last LLVM social we discussed the progress on the CGSCC
> > > > pass
> > > > manager. It seems like Chandler has a CGSCC pass manager
> > > > working,
> > > > but it is still unresolved exactly which semantics we want
> > > > (more
> > > > about this below) that are reasonably implementable.
> > > 
> > 
> 

> > > > AFAICT, there has been no public discussion about what exact
> > > > semantics we ultimately want to have. We should figure that
> > > > out.
> > > 
> > 
> 

> > > > The main difficulty which Chandler described is the apparently
> > > > quite
> > > > complex logic surrounding needing to run function passes nested
> > > > within an SCC pass manager, while providing some guarantees
> > > > about
> > > > exactly what order the function passes are run. The existing
> > > > CGSCC
> > > > pass manager just punts on some of the problems that arise
> > > > (look
> > > > in
> > > > CGPassManager::runOnModule, CGPassManager::RunAllPassesOnSCC,
> > > > and
> > > > CGPassManager::RunPassOnSCC in
> > > > llvm/lib/Analysis/CallGraphSCCPass.cpp), and these are the
> > > > problems
> > > > that Chandler has been trying to solve.
> > > 
> > 
> 

> > > > (
> > > 
> > 
> 
> > > > Why is this "function passes inside CGSCC passes" stuff
> > > > interesting?
> > > > Because LLVM can do inlining on an SCC (often just a single
> > > > function) and then run function passes to simplify the
> > > > function(s)
> > > > in the SCC before it tries to inline into a parent SCC. (the
> > > > SCC
> > > > visitation order is post-order)
> > > 
> > 
> 
> > > > For example, we may inline a bunch of code, but after inlining
> > > > we
> > > > can
> > > > tremendously simplify the function, and we want to do so before
> > > > considering this function for inlining into its callers so that
> > > > we
> > > > get an accurate evaluation of the inline cost.
> > > 
> > 
> 
> > > > Based on what Chandler said, it seems that LLVM is fairly
> > > > unique
> > > > in
> > > > this regard and other compilers don't do this (which is why we
> > > > can't
> > > > just look at how other compilers solve this problem; they don't
> > > > have
> > > > this problem (maybe they should? or maybe we shouldn't?)). For
> > > > example, he described that GCC uses different inlining
> > > > "phases";
> > > > e.g. it does early inlining on the entire module, then does
> > > > simplifications on the entire module, then does late inlining
> > > > on
> > > > the
> > > > entire module; so it is not able to incrementally simplify as
> > > > it
> > > > inlines like LLVM does.
> > > 
> > 
> 
> > > This incremental simplification is an important feature of our
> > > inliner, and one we should endeavor to keep. We might also want
> > > different phases at some point (e.g. a top-down and a bottom-up
> > > phase), but that's another story.
> > 
> 

> > > > )
> > > 
> > 
> 

> > > > As background for what is below, the LazyCallGraph tracks two
> > > > graphs:
> > > > the "call graph" and the "ref graph".
> > > 
> > 
> 
> > > > Conceptually, the call graph is the graph of direct calls,
> > > > where
> > > > indirect calls and calls to external functions do not appear
> > > > (or
> > > > are
> > > > connected to dummy nodes). The ref graph is basically the graph
> > > > of
> > > > all functions transitively accessible based on the
> > > > globals/constants/etc. referenced by a function (e.g. if a
> > > > function
> > > > `foo` references a vtable that is defined in the module, there
> > > > is
> > > > an
> > > > edge in the ref graph from `foo` to every function in the
> > > > vtable).
> > > 
> > 
> 
> > > > The call graph is a strict subset of the ref graph.
> > > 
> > 
> 

> > > > Chandler described that he had a major breakthrough in that the
> > > > CGSCC
> > > > pass manager only had to deal with 3 classes of modifications
> > > > that
> > > > can occur:
> > > 
> > 
> 
> > > > - a pass may e.g. propagate a load of a function pointer into
> > > > an
> > > > indirect call, turning it into an direct call. This requires
> > > > adding
> > > > an edge in the CG but not in the ref graph.
> > > 
> > 
> 
> > > > - a pass may take a direct call and turn it into an indirect
> > > > call.
> > > > This requires removing an edge from the CG, but not in the ref
> > > > graph.
> > > 
> > 
> 
> > > > - a pass may delete a direct call. This removes an edge in the
> > > > CG
> > > > and
> > > > also in the ref graph.
> > > 
> > 
> 

> > > > From the perspective of the CGSCC pass manager, these
> > > > operations
> > > > can
> > > > affect the SCC structure. Adding an edge might merge SCC's and
> > > > deleting an edge might split SCC's. Chandler mentioned that
> > > > apparently the issues of splitting and merging SCC's within the
> > > > current infrastructure are actually quite challenging and lead
> > > > to
> > > > e.g. iterator invalidation issues, and that is what he is
> > > > working
> > > > on.
> > > 
> > 
> 

> > > > (
> > > 
> > 
> 
> > > > The ref graph is important to guide the overall SCC visitation
> > > > order
> > > > because it basically represents "the largest graph that the CG
> > > > may
> > > > turn into due to our static analysis of this module". I.e. no
> > > > transformation we can statically make in the CGSCC passes can
> > > > ever
> > > > cause us to need to merge SCC's in the ref graph.
> > > 
> > 
> 
> > > > )
> > > 
> > 
> 

> > > > I have a couple overall questions/concerns:
> > > 
> > 
> 

> > > > 1. The ref graph can easily go quadratic. E.g.
> > > 
> > 
> 

> > > > typedef void (*fp)();
> > > 
> > 
> 
> > > > fp funcs[] = {
> > > 
> > 
> 
> > > > &foo1,
> > > 
> > 
> 
> > > > &foo2,
> > > 
> > 
> 
> > > > ...
> > > 
> > 
> 
> > > > &fooN
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 
> > > > void foo1() { funcs[something](); }
> > > 
> > 
> 

> > > > void foo2() { funcs[something](); }
> > > 
> > 
> 
> > > > ...
> > > 
> > 
> 
> > > > void fooN() { funcs[something](); }
> > > 
> > 
> 

> > > > One real-world case where this might come about is in the
> > > > presence
> > > > of
> > > > vtables.
> > > 
> > 
> 

> > > > The existing CGSCC pass manager does not have this issue AFAIK
> > > > because it does not consider the ref graph.
> > > 
> > 
> 

> > > > Does anybody have any info/experience about how densely
> > > > connected
> > > > the
> > > > ref graph can get in programs that might reasonably be fed to
> > > > the
> > > > compiler?
> > > 
> > 
> 
> > > > I just did a quick sanity check with LLD/ELF using
> > > > `--lto-newpm-passes=cgscc(no-op-cgscc)` and it at least seemed
> > > > to
> > > > terminate / not run out of memory. Based on some rough
> > > > calculations
> > > > looking at the profile, it seem like the entire run of the
> > > > inliner
> > > > in the old LTO pipeline (which is about 5% of total LTO time on
> > > > this
> > > > particular example I looked at) is only 2-3x as expensive as
> > > > just
> > > > `--lto-newpm-passes=cgscc(no-op-cgscc)`, so the LazyCallGraph
> > > > construction is certainly not free.
> > > 
> > 
> 

> > > > 2. What is the intended behavior of CGSCC passes when SCC's are
> > > > split
> > > > or merged? E.g. a CGSCC pass runs on an SCC (e.g. the inliner).
> > > > Now
> > > > we run some function passes nested inside the CGSCC pass
> > > > manager
> > > > (e.g. to simplify things after inlining). Consider:
> > > 
> > 
> 
> > > This is not how I thought the current scheme worked ;) -- I was
> > > under
> > > the impression that we had a call graph with
> > > conservatively-connected dummy nodes for external/indirect
> > > functions.
> > 
> 
> > The fact that we have a separate nodes for calling into an external
> > function and "being called" from an external function, these don't
> > form SCC. So it means we can end up merging SCC IIUC.
> 
> Thanks for clarifying this.

> My intuition for this behavior (by looking at a limiting case) is
> that e.g. during FullLTO we have `main`, which is "externally
> called", so that any function containing an external function call
> or indirect call would may end up calling back into main from the
> compiler's perspective.
> This would result in a huge SCC containing `main` and all functions
> that transitively call a function containing an external function
> call / indirect call; this doesn't seem like the SCC we want to look
> at during inlining.
No, I thought we special-cased main. Or at least we did until we figured out that we couldn't do that for all languages, and so we introduced some kind of "no recurse" attribute? 

-Hal 

> I.e. the conservatively correct call graph is not the graph that we
> necessarily want during inlining. The LazyCallGraph "potential
> direct call" approach seems a better fit for what the inliner wants.

> The comments in LazyCallGraph.h do clarify this somewhat:
> https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/LazyCallGraph.h#L14

> E.g. it says:
> ```

> /// NB: This is *not* a traditional call graph! It is a graph which
> models both
> /// the current calls and potential calls. As a consequence there are
> many
> /// edges in this call graph that do not correspond to a 'call' or
> 'invoke'
> /// instruction.
> ```

> -- Sean Silva

> > --
> 
> > Mehdi
> 

> > > As a result, there is no semantics-preserving optimization that
> > > will
> > > merge SCCs, only split them. In that case, I'd expect that once
> > > an
> > > SCC is split, we re-run the CGSCC passes over the newly-separated
> > > SCCs. But this corresponds to running over the "ref graph", as
> > > you
> > > describe it. I don't understand why we want the non-conservative
> > > graph.
> > 
> 

> > > > a) These function passes are e.g. now able to devirtualize a
> > > > call,
> > > > adding an edge to the CG, forming a larger CG SCC. Do you
> > > > re-run
> > > > the
> > > > CGSCC pass (say, the inliner) on this larger SCC?
> > > 
> > 
> 

> > > > b) These function passes are e.g. able to DCE a call, removing
> > > > an
> > > > edge from the CG. This converts, say, a CG SCC which is a cycle
> > > > graph (like a->b->c->a) into a path graph (a->b->c, with no
> > > > edge
> > > > back to a). The inliner had already visited a, b, and c as a
> > > > single
> > > > SCC. Now does it have to re-visit c, then b, then a, as
> > > > single-node
> > > > SCC's?
> > > 
> > 
> 

> > > > btw:
> > > 
> > 
> 

> > > > One way that I have found it useful to think about this is in
> > > > terms
> > > > of the visitation during Tarjan's SCC algorithm. I'll reference
> > > > the
> > > > pseudocode in
> > > > https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm
> > > > . Inside the "strongconnect" routine when we have identified an
> > > > SCC
> > > > (the true branch of `if (v.lowlink = v.index)` test ) we can
> > > > visit
> > > > stack[v.index:stack.size()] as an SCC. This may or may not
> > > > invalidate some things on the stack (the variable `S` in the
> > > > pseudocode) and we may need to fix it up (e.g. inlining deleted
> > > > a
> > > > function, so we can't have an entry on the stack). Then, we can
> > > > run
> > > > function passes as we pop individual functions off the stack,
> > > > but
> > > > it
> > > > is easier to think about IMO than merging of SCC data
> > > > structures:
> > > > if
> > > > we add edges to the CG then we have to do more DFS on the new
> > > > edges
> > > > and if we delete edges then the DFS order of the stack gives us
> > > > certain guarantees.
> > > 
> > 
> 
> > > > Personally I find this much easier to reason about than the
> > > > description in terms of splitting and merging SCC's in the CG
> > > > and
> > > > ref graph (which the LazyCallGraph API makes one to think about
> > > > since it hides the underlying Tarjan's algorithm).
> > > 
> > 
> 
> > > > The LazyCallGraph API makes the current loop in
> > > > http://reviews.llvm.org/diffusion/L/browse/llvm/trunk/include/llvm/Analysis/CGSCCPassManager.h;272124$100
> > > > very clean, but at least for my thinking about the problem, it
> > > > seems
> > > > like the wrong abstraction (and most of the LazyCallGraph API
> > > > seems
> > > > to be unused, so it seems like it may be overly heavyweight).
> > > 
> > 
> 
> > > > E.g. I think that maybe the easiest thing to do is to turn the
> > > > current approach inside out: instead of having the pass manager
> > > > logic be the "normal code" and forcing the Tarjan algorithm to
> > > > become a state machine of iterators, use an open-coded Tarjan
> > > > algorithm with some callbacks and make the pass management
> > > > logic
> > > > be
> > > > the state machine.
> > > 
> > 
> 
> > > > This will also open the door to avoiding the potentially
> > > > quadratic
> > > > size of the ref graph, since e.g. in the example I gave above,
> > > > we
> > > > can mark the `funcs` array itself as already having been
> > > > visited
> > > > during the walk. In the current LazyCallGraph, this would
> > > > require
> > > > adding some sort of notion of hyperedge.
> > > 
> > 
> 
> > > FWIW, I see no purpose in abstracting one algorithm, especially
> > > if
> > > that makes things algorithmically harder. Also, the LazyCallGraph
> > > abstraction and the iterator abstraction seem like separate
> > > issues.
> > > Iterator abstractions are often useful because you can use them
> > > in
> > > generic algorithms, etc.
> > 
> 

> > > > Since this is such a high priority (due to blocking PGO
> > > > inlining),
> > > > I
> > > > will probably try my hand at implementing the CGSCC pass
> > > > manager
> > > > sometime soon unless somebody beats me to it. (I'll probably
> > > > try
> > > > the
> > > > "open-coded SCC visit" approach).
> > > 
> > 
> 

> > > > Another possibility is implementing the new CGSCC pass manager
> > > > that
> > > > uses the same visitation semantics as the one in the old PM,
> > > > and
> > > > then we can refactor that as needed. In fact, that may be the
> > > > best
> > > > approach so that porting to the new PM is as NFC as possible
> > > > and
> > > > we
> > > > can isolate the functional (i.e., need benchmarks, measurements
> > > > ...)
> > > > changes in separate commits.
> > > 
> > 
> 
> > > I'm in favor of this approach for exactly the reason you mention.
> > > Being able to bisect regressions to the algorithmic change,
> > > separate
> > > from the infrastructure change, will likely make things easier in
> > > the long run (and will avoid the problem, to the extent possible,
> > > of
> > > performance regressions blocking the pass-manager work).
> > 
> 

> > > > Sorry for the wall of text.
> > > 
> > 
> 
> > > No problem; much appreciated.
> > 
> 

> > > -Hal
> > 
> 

> > > > -- Sean Silva
> > > 
> > 
> 
> > > > _______________________________________________
> > > 
> > 
> 
> > > > LLVM Developers mailing list
> > > 
> > 
> 
> > > > llvm-dev at lists.llvm.org
> > > 
> > 
> 
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > 
> > 
> 

> > > --
> > 
> 

> > > Hal Finkel
> > 
> 
> > > Assistant Computational Scientist
> > 
> 
> > > Leadership Computing Facility
> > 
> 
> > > Argonne National Laboratory
> > 
> 
> > > _______________________________________________
> > 
> 
> > > LLVM Developers mailing list
> > 
> 
> > > llvm-dev at lists.llvm.org
> > 
> 
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/52f27de8/attachment-0001.html>