[llvm-dev] Intended behavior of CGSCC pass manager.

Thu Jun 16 10:13:00 PDT 2016

----- Original Message -----

> From: "Sean Silva via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "Xinliang David Li" <davidxl at google.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Thursday, June 16, 2016 6:48:30 AM
> Subject: Re: [llvm-dev] Intended behavior of CGSCC pass manager.

> On Tue, Jun 14, 2016 at 11:29 PM, Xinliang David Li <
> davidxl at google.com > wrote:

> > On Thu, Jun 9, 2016 at 3:26 AM, Sean Silva < chisophugis at gmail.com
> > >
> > wrote:
> 

> > > On Wed, Jun 8, 2016 at 8:14 PM, Xinliang David Li <
> > > davidxl at google.com > wrote:
> > 
> 

> > > > On Wed, Jun 8, 2016 at 4:38 PM, Sean Silva <
> > > > chisophugis at gmail.com
> > > > >
> > > > wrote:
> > > 
> > 
> 

> > > > > On Wed, Jun 8, 2016 at 12:31 PM, Xinliang David Li <
> > > > > davidxl at google.com > wrote:
> > > > 
> > > 
> > 
> 

> > > > > > On Wed, Jun 8, 2016 at 4:19 AM, Sean Silva <
> > > > > > chisophugis at gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > Hi Chandler, Philip, Mehdi, (and llvm-dev,)
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > (this is partially a summary of some discussions that
> > > > > > > happened
> > > > > > > at
> > > > > > > the
> > > > > > > last LLVM bay area social, and partially a discussion
> > > > > > > about
> > > > > > > the
> > > > > > > direction of the CGSCC pass manager)
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > A the last LLVM social we discussed the progress on the
> > > > > > > CGSCC
> > > > > > > pass
> > > > > > > manager. It seems like Chandler has a CGSCC pass manager
> > > > > > > working,
> > > > > > > but it is still unresolved exactly which semantics we
> > > > > > > want
> > > > > > > (more
> > > > > > > about this below) that are reasonably implementable.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > AFAICT, there has been no public discussion about what
> > > > > > > exact
> > > > > > > semantics we ultimately want to have. We should figure
> > > > > > > that
> > > > > > > out.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > The main difficulty which Chandler described is the
> > > > > > > apparently
> > > > > > > quite
> > > > > > > complex logic surrounding needing to run function passes
> > > > > > > nested
> > > > > > > within an SCC pass manager, while providing some
> > > > > > > guarantees
> > > > > > > about
> > > > > > > exactly what order the function passes are run. The
> > > > > > > existing
> > > > > > > CGSCC
> > > > > > > pass manager just punts on some of the problems that
> > > > > > > arise
> > > > > > > (look
> > > > > > > in
> > > > > > > CGPassManager::runOnModule,
> > > > > > > CGPassManager::RunAllPassesOnSCC,
> > > > > > > and
> > > > > > > CGPassManager::RunPassOnSCC in
> > > > > > > llvm/lib/Analysis/CallGraphSCCPass.cpp), and these are
> > > > > > > the
> > > > > > > problems
> > > > > > > that Chandler has been trying to solve.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > (
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Why is this "function passes inside CGSCC passes" stuff
> > > > > > > interesting?
> > > > > > > Because LLVM can do inlining on an SCC (often just a
> > > > > > > single
> > > > > > > function) and then run function passes to simplify the
> > > > > > > function(s)
> > > > > > > in the SCC before it tries to inline into a parent SCC.
> > > > > > > (the
> > > > > > > SCC
> > > > > > > visitation order is post-order)
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > For example, we may inline a bunch of code, but after
> > > > > > > inlining
> > > > > > > we
> > > > > > > can
> > > > > > > tremendously simplify the function, and we want to do so
> > > > > > > before
> > > > > > > considering this function for inlining into its callers
> > > > > > > so
> > > > > > > that
> > > > > > > we
> > > > > > > get an accurate evaluation of the inline cost.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Based on what Chandler said, it seems that LLVM is fairly
> > > > > > > unique
> > > > > > > in
> > > > > > > this regard and other compilers don't do this (which is
> > > > > > > why
> > > > > > > we
> > > > > > > can't
> > > > > > > just look at how other compilers solve this problem; they
> > > > > > > don't
> > > > > > > have
> > > > > > > this problem (maybe they should? or maybe we
> > > > > > > shouldn't?)).
> > > > > > > For
> > > > > > > example, he described that GCC uses different inlining
> > > > > > > "phases";
> > > > > > > e.g. it does early inlining on the entire module, then
> > > > > > > does
> > > > > > > simplifications on the entire module, then does late
> > > > > > > inlining
> > > > > > > on
> > > > > > > the
> > > > > > > entire module; so it is not able to incrementally
> > > > > > > simplify
> > > > > > > as
> > > > > > > it
> > > > > > > inlines like LLVM does.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > )
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > As background for what is below, the LazyCallGraph tracks
> > > > > > > two
> > > > > > > graphs:
> > > > > > > the "call graph" and the "ref graph".
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Conceptually, the call graph is the graph of direct
> > > > > > > calls,
> > > > > > > where
> > > > > > > indirect calls and calls to external functions do not
> > > > > > > appear
> > > > > > > (or
> > > > > > > are
> > > > > > > connected to dummy nodes). The ref graph is basically the
> > > > > > > graph
> > > > > > > of
> > > > > > > all functions transitively accessible based on the
> > > > > > > globals/constants/etc. referenced by a function (e.g. if
> > > > > > > a
> > > > > > > function
> > > > > > > `foo` references a vtable that is defined in the module,
> > > > > > > there
> > > > > > > is
> > > > > > > an
> > > > > > > edge in the ref graph from `foo` to every function in the
> > > > > > > vtable).
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > The call graph is a strict subset of the ref graph.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > Chandler described that he had a major breakthrough in
> > > > > > > that
> > > > > > > the
> > > > > > > CGSCC
> > > > > > > pass manager only had to deal with 3 classes of
> > > > > > > modifications
> > > > > > > that
> > > > > > > can occur:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > - a pass may e.g. propagate a load of a function pointer
> > > > > > > into
> > > > > > > an
> > > > > > > indirect call, turning it into an direct call. This
> > > > > > > requires
> > > > > > > adding
> > > > > > > an edge in the CG but not in the ref graph.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > - a pass may take a direct call and turn it into an
> > > > > > > indirect
> > > > > > > call.
> > > > > > > This requires removing an edge from the CG, but not in
> > > > > > > the
> > > > > > > ref
> > > > > > > graph.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > - a pass may delete a direct call. This removes an edge
> > > > > > > in
> > > > > > > the
> > > > > > > CG
> > > > > > > and
> > > > > > > also in the ref graph.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > From the perspective of the CGSCC pass manager, these
> > > > > > > operations
> > > > > > > can
> > > > > > > affect the SCC structure. Adding an edge might merge
> > > > > > > SCC's
> > > > > > > and
> > > > > > > deleting an edge might split SCC's. Chandler mentioned
> > > > > > > that
> > > > > > > apparently the issues of splitting and merging SCC's
> > > > > > > within
> > > > > > > the
> > > > > > > current infrastructure are actually quite challenging and
> > > > > > > lead
> > > > > > > to
> > > > > > > e.g. iterator invalidation issues, and that is what he is
> > > > > > > working
> > > > > > > on.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > (
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > The ref graph is important to guide the overall SCC
> > > > > > > visitation
> > > > > > > order
> > > > > > > because it basically represents "the largest graph that
> > > > > > > the
> > > > > > > CG
> > > > > > > may
> > > > > > > turn into due to our static analysis of this module".
> > > > > > > I.e.
> > > > > > > no
> > > > > > > transformation we can statically make in the CGSCC passes
> > > > > > > can
> > > > > > > ever
> > > > > > > cause us to need to merge SCC's in the ref graph.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > )
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > I have a couple overall questions/concerns:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > 1. The ref graph can easily go quadratic. E.g.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > typedef void (*fp)();
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > fp funcs[] = {
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > &foo1,
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > &foo2,
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > ...
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > &fooN
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > }
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > void foo1() { funcs[something](); }
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > void foo2() { funcs[something](); }
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > ...
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > void fooN() { funcs[something](); }
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > One real-world case where this might come about is in the
> > > > > > > presence
> > > > > > > of
> > > > > > > vtables.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > The existing CGSCC pass manager does not have this issue
> > > > > > > AFAIK
> > > > > > > because it does not consider the ref graph.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > Does anybody have any info/experience about how densely
> > > > > > > connected
> > > > > > > the
> > > > > > > ref graph can get in programs that might reasonably be
> > > > > > > fed
> > > > > > > to
> > > > > > > the
> > > > > > > compiler?
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > I just did a quick sanity check with LLD/ELF using
> > > > > > > `--lto-newpm-passes=cgscc(no-op-cgscc)` and it at least
> > > > > > > seemed
> > > > > > > to
> > > > > > > terminate / not run out of memory. Based on some rough
> > > > > > > calculations
> > > > > > > looking at the profile, it seem like the entire run of
> > > > > > > the
> > > > > > > inliner
> > > > > > > in the old LTO pipeline (which is about 5% of total LTO
> > > > > > > time
> > > > > > > on
> > > > > > > this
> > > > > > > particular example I looked at) is only 2-3x as expensive
> > > > > > > as
> > > > > > > just
> > > > > > > `--lto-newpm-passes=cgscc(no-op-cgscc)`, so the
> > > > > > > LazyCallGraph
> > > > > > > construction is certainly not free.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Conceptually, reference graph should also include variable
> > > > > > nodes.
> > > > > > With variable nodes introduced, the quadratic behavior
> > > > > > mentioned
> > > > > > can
> > > > > > be avoided.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > Yeah, this is what I was trying to get at with the statement
> > > > > "
> > > > > In
> > > > > the
> > > > > current LazyCallGraph, this would require adding some sort of
> > > > > notion
> > > > > of hyperedge. "
> > > > 
> > > 
> > 
> 
> > > > > But you are right: from an implementation perspective of a
> > > > > call
> > > > > graph
> > > > > data structure that is trying to model the "ref graph" of
> > > > > LazyCallGraph, it is cleaner to have nodes that are not
> > > > > functions
> > > > > than to introduce a notion of hyperedge.
> > > > 
> > > 
> > 
> 

> > > > > > > 2. What is the intended behavior of CGSCC passes when
> > > > > > > SCC's
> > > > > > > are
> > > > > > > split
> > > > > > > or merged? E.g. a CGSCC pass runs on an SCC (e.g. the
> > > > > > > inliner).
> > > > > > > Now
> > > > > > > we run some function passes nested inside the CGSCC pass
> > > > > > > manager
> > > > > > > (e.g. to simplify things after inlining). Consider:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > a) These function passes are e.g. now able to
> > > > > > > devirtualize
> > > > > > > a
> > > > > > > call,
> > > > > > > adding an edge to the CG, forming a larger CG SCC. Do you
> > > > > > > re-run
> > > > > > > the
> > > > > > > CGSCC pass (say, the inliner) on this larger SCC?
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > b) These function passes are e.g. able to DCE a call,
> > > > > > > removing
> > > > > > > an
> > > > > > > edge from the CG. This converts, say, a CG SCC which is a
> > > > > > > cycle
> > > > > > > graph (like a->b->c->a) into a path graph (a->b->c, with
> > > > > > > no
> > > > > > > edge
> > > > > > > back to a). The inliner had already visited a, b, and c
> > > > > > > as
> > > > > > > a
> > > > > > > single
> > > > > > > SCC. Now does it have to re-visit c, then b, then a, as
> > > > > > > single-node
> > > > > > > SCC's?
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > btw:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > One way that I have found it useful to think about this
> > > > > > > is
> > > > > > > in
> > > > > > > terms
> > > > > > > of the visitation during Tarjan's SCC algorithm. I'll
> > > > > > > reference
> > > > > > > the
> > > > > > > pseudocode in
> > > > > > > https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm
> > > > > > > . Inside the "strongconnect" routine when we have
> > > > > > > identified
> > > > > > > an
> > > > > > > SCC
> > > > > > > (the true branch of `if (v.lowlink = v.index)` test ) we
> > > > > > > can
> > > > > > > visit
> > > > > > > stack[v.index:stack.size()] as an SCC. This may or may
> > > > > > > not
> > > > > > > invalidate some things on the stack (the variable `S` in
> > > > > > > the
> > > > > > > pseudocode) and we may need to fix it up (e.g. inlining
> > > > > > > deleted
> > > > > > > a
> > > > > > > function, so we can't have an entry on the stack). Then,
> > > > > > > we
> > > > > > > can
> > > > > > > run
> > > > > > > function passes as we pop individual functions off the
> > > > > > > stack,
> > > > > > > but
> > > > > > > it
> > > > > > > is easier to think about IMO than merging of SCC data
> > > > > > > structures:
> > > > > > > if
> > > > > > > we add edges to the CG then we have to do more DFS on the
> > > > > > > new
> > > > > > > edges
> > > > > > > and if we delete edges then the DFS order of the stack
> > > > > > > gives
> > > > > > > us
> > > > > > > certain guarantees.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Personally I find this much easier to reason about than
> > > > > > > the
> > > > > > > description in terms of splitting and merging SCC's in
> > > > > > > the
> > > > > > > CG
> > > > > > > and
> > > > > > > ref graph (which the LazyCallGraph API makes one to think
> > > > > > > about
> > > > > > > since it hides the underlying Tarjan's algorithm).
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > The LazyCallGraph API makes the current loop in
> > > > > > > http://reviews.llvm.org/diffusion/L/browse/llvm/trunk/include/llvm/Analysis/CGSCCPassManager.h;272124$100
> > > > > > > very clean, but at least for my thinking about the
> > > > > > > problem,
> > > > > > > it
> > > > > > > seems
> > > > > > > like the wrong abstraction (and most of the LazyCallGraph
> > > > > > > API
> > > > > > > seems
> > > > > > > to be unused, so it seems like it may be overly
> > > > > > > heavyweight).
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > E.g. I think that maybe the easiest thing to do is to
> > > > > > > turn
> > > > > > > the
> > > > > > > current approach inside out: instead of having the pass
> > > > > > > manager
> > > > > > > logic be the "normal code" and forcing the Tarjan
> > > > > > > algorithm
> > > > > > > to
> > > > > > > become a state machine of iterators, use an open-coded
> > > > > > > Tarjan
> > > > > > > algorithm with some callbacks and make the pass
> > > > > > > management
> > > > > > > logic
> > > > > > > be
> > > > > > > the state machine.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > This will also open the door to avoiding the potentially
> > > > > > > quadratic
> > > > > > > size of the ref graph, since e.g. in the example I gave
> > > > > > > above,
> > > > > > > we
> > > > > > > can mark the `funcs` array itself as already having been
> > > > > > > visited
> > > > > > > during the walk. In the current LazyCallGraph, this would
> > > > > > > require
> > > > > > > adding some sort of notion of hyperedge.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > Since this is such a high priority (due to blocking PGO
> > > > > > > inlining),
> > > > > > > I
> > > > > > > will probably try my hand at implementing the CGSCC pass
> > > > > > > manager
> > > > > > > sometime soon unless somebody beats me to it. (I'll
> > > > > > > probably
> > > > > > > try
> > > > > > > the
> > > > > > > "open-coded SCC visit" approach).
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > Another possibility is implementing the new CGSCC pass
> > > > > > > manager
> > > > > > > that
> > > > > > > uses the same visitation semantics as the one in the old
> > > > > > > PM,
> > > > > > > and
> > > > > > > then we can refactor that as needed. In fact, that may be
> > > > > > > the
> > > > > > > best
> > > > > > > approach so that porting to the new PM is as NFC as
> > > > > > > possible
> > > > > > > and
> > > > > > > we
> > > > > > > can isolate the functional (i.e., need benchmarks,
> > > > > > > measurements
> > > > > > > ...)
> > > > > > > changes in separate commits.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > A very high level comment: why do we need to update
> > > > > > callgraph
> > > > > > on
> > > > > > the
> > > > > > fly ? Can we have a more general support of iterative SCC
> > > > > > pass
> > > > > > invocation?
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > something like:
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > 1) build the callgraph
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 2) cache the post-order traversal order
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > 3) if the order list is empty -- done
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 4) traversal: invoke function passes for each function on
> > > > > > the
> > > > > > order
> > > > > > (step 2 or 5). The call graph gets updated on the fly (with
> > > > > > new
> > > > > > edges, or new nodes for cloned functions)
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 5) update the function traversal order from new nodes and
> > > > > > new
> > > > > > edges
> > > > > > created in 4)
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > 6) go to step 3).
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > (sorry for the delayed reply... this is a very poignant
> > > > > question
> > > > > /
> > > > > example)
> > > > 
> > > 
> > 
> 

> > > > > From the discussion with Chandler, I think he wants to
> > > > > provide
> > > > > more
> > > > > guarantees to function passes about the visitation order. He
> > > > > will
> > > > > need to explain his exact concerns. But IIRC the essence of
> > > > > one
> > > > > of
> > > > > the issues is captured in the example I gave in 2.b) in the
> > > > > OP:
> > > > 
> > > 
> > 
> 

> > > > > " These function passes are e.g. able to DCE a call, removing
> > > > > an
> > > > > edge
> > > > > from the CG. This converts, say, a CG SCC which is a cycle
> > > > > graph
> > > > > (like a->b->c->a) into a path graph (a->b->c, with no edge
> > > > > back
> > > > > to
> > > > > a)."
> > > > 
> > > 
> > 
> 

> > > > > The issue that I remember Chandler brought up is that before
> > > > > deleting
> > > > > an edge from an SCC, the nodes in the SCC are "unordered"
> > > > > w.r.t.
> > > > > each other; but after deleting an edge from the CG which
> > > > > splits
> > > > > the
> > > > > SCC, there is an order.
> > > > 
> > > 
> > 
> 
> > > > > In other words, that the cached post-order may no longer be a
> > > > > post-order traversal of the new graph after a function pass
> > > > > runs.
> > > > > For example, in the cached post-order traversal we may have
> > > > > entered
> > > > > the SCC `a->b->c->a` via an edge `x->b` and so our cached
> > > > > post-order
> > > > > traversal visits the functions in the order `a, then c, then
> > > > > b`.
> > > > > If
> > > > > a function pass visits `c` and deletes the call `c->a`, then
> > > > > "a,
> > > > > then c, then b" is not a valid post-order for visiting the
> > > > > functions.
> > > > 
> > > 
> > 
> 

> > > > > Looking at the explanation I gave, I think there isn't really
> > > > > a
> > > > > problem here. The cached post-order can only be invalidated
> > > > > for
> > > > > functions that have already been visited.
> > > > 
> > > 
> > 
> 
> > > > If we merge CGSCC based transformation passes with an IPA
> > > > analysis
> > > > pass that requires strict/valid bottom up order, then updating
> > > > the
> > > > SCCs on the fly might be needed (or not if the result will
> > > > still
> > > > be
> > > > conservatively correct) -- but interleaving IPA analyses with
> > > > IPA
> > > > transformations like this is not the right thing to do also for
> > > > other reasons.
> > > 
> > 
> 

> > > > Assuming CGSCC pass just does a predefined set of
> > > > transformations
> > > > at
> > > > function level with some order, there is no issue of
> > > > 'invalidation'
> > > > -- the cached order will always be valid. In 2.b), why bother
> > > > to
> > > > revisit the nodes?
> > > 
> > 
> 
> > > Indeed. That is the type of doubt for why I was asking this
> > > question
> > > in the OP.
> > 
> 
> > > > > Speaking more broadly about the algorithm you just described,
> > > > > did
> > > > > you
> > > > > intend to omit an SCC visitation step?
> > > > 
> > > 
> > 
> 
> > > > It is independent of this decision. However we can actually go
> > > > back
> > > > one step and ask the question: is adding this layer really
> > > > necessary? Why not just traversing the cgraph nodes in reverse
> > > > topo-order (after removing the cycles)?
> > > 
> > 
> 

> > > > > The goal of the CGSCC pass manager is the ability to visit an
> > > > > SCC
> > > > > (e.g. inliner visits an SCC), then immediately run function
> > > > > passes
> > > > > to simplify the result of inlining.
> > > > 
> > > 
> > 
> 
> > > > This is how current implementation is. Is it a fundamental
> > > > requirement?
> > > 
> > 
> 
> > > I think this is a good question. Hopefully Chandler will be able
> > > to
> > > reply to this thread. From what I can tell, the exact
> > > requirements
> > > have never really been discussed.
> > 
> 

> > > > > Only after the simplification has occurred do we visit the
> > > > > parent
> > > > > SCC. By running the simplifications in lock-step with the
> > > > > inliner
> > > > > SCC visitation we give parent SCC's a more accurate view of
> > > > > the
> > > > > real
> > > > > cost of inlining a function from a child SCC.
> > > > 
> > > 
> > 
> 
> > > > This works for all bottom up scheme regardless of a SCC layer
> > > > is
> > > > added or not.
> > > 
> > 
> 

> > > > > Issues similar to 2.a) (i.e. adding edges to the CG) also
> > > > > affect
> > > > > the
> > > > > visitation order of SCC's (not just functions). For example,
> > > > > we
> > > > > visit an SCC with a CGSCC pass (e.g. inliner). Then we run
> > > > > the
> > > > > first
> > > > > function pass on that SCC, which may add an edge (e.g.
> > > > > promote
> > > > > indirect call to direct call) that may enlarge the SCC. Do we
> > > > > continue running the remaining function passes? Do we
> > > > > re-visit
> > > > > this
> > > > > new enlarged SCC? (if so, when?) These are the types of
> > > > > questions
> > > > > that motivated this post (hence the name "Intended behavior
> > > > > of
> > > > > CGSCC
> > > > > pass manager.").
> > > > 
> > > 
> > 
> 

> > > > yes, I understand the motivation -- that is way I propose the
> > > > agorithm that uses cached iteration order + worklist based
> > > > iterative
> > > > approach. In your example, when a new edge is exposed via
> > > > devirtualization, only its caller/ancestor nodes need to be
> > > > revisited in the next iteration.
> > > 
> > 
> 
> > > Your proposal certainly sounds simpler and easier to reason
> > > about.
> > > (e.g. it naturally avoids any issues with deletion)
> > 
> 

> > > Since there are only 3 non-inliner CGSCC passes (ArgPromotion,
> > > PostOrderFunctionAttrsLegacyPass, PruneEH) it may be feasible to
> > > remove the entire abstraction from LLVM and replace it with a
> > > cached
> > > post-order function pass visitation like you suggest. The inliner
> > > doesn't seem to do anything special with the knowledge that it is
> > > visiting an SCC (besides moving call sites that call within the
> > > SCC
> > > to the end of its worklist) and so this may be fine.
> > 
> 

> > A) Using cached callgraph walk (nodes or SCC) to avoid cgraph
> > invalidation and B) whether SCC based walk has more advantages are
> > two different things to answer.
> 

> > For the later, I now tend to believe that introducing additional
> > SCC
> > layer has more advantages than alternative that uses node based
> > bottom-up walk.
> 
> > 1) Bottom-up based IPA algorithms can be implemented more cleanly
> > by
> > using SCC. For instance the pruneEH or any bottom-up attribute
> > propagations. Without using SCC, we will need to defined a generic
> > (using template) worklist based bottom-up propagation engine.
> 
> > 2) All SCC passes can be nicely grouped together -- for compile
> > time
> > reasons or to create synergies for better performance.
> 

> > For A), to repeat myself, SCC DAG caching to avoid mutation for one
> > round of CG walk has many advantages. We should probably discuss
> > more on pros/cons and various scenarios we want to handle ..
> 
> One question is what invariants we want to provide for the
> visitation.

> For example, should a CGSCC pass be able to assume that all "child"
> SCC's (SCC's it can reach via direct calls emanating from the SCC
> being visited) have already been visited? Currently I don't think it
> can, and IIRC from the discussion at the social this is one thing
> that Chandler is hoping to fix. The "ref edge" notion in
> LazyCallGraph ensures that we cannot create a call edge
> (devirtualizing a ref edge) that will point at an SCC that has not
> yet been visited.

> E.g. consider this graph:

> digraph G {
> A -> B; B -> A; // SCC {A,B}
> S -> T; T -> S; // SCC {S,T}
> X -> Y; Y -> X; // SCC {X,Y}

> B -> X;
> B -> S;
> T -> Y [label="Ref edge that is devirtualized\nwhen visiting SCC
> {S,T}",constraint=false,style=dashed]
> }

> (can visualize conveniently at http://www.webgraphviz.com/ or I have
> put an image at http://reviews.llvm.org/F2073104 )

> If we do not consider the ref graph, then it is possible for SCC
> {S,T} to be visited before SCC {X,Y}. So after devirtualizing the
> call T->Y we can no longer assume that the SCC's are visited in
> post-order (or must somehow try to ignore that the call edge T->Y
> exists).

> A more complicated case is when SCC {S,T} and SCC {X,Y} both call
> into each other via function pointers. So eventually after
> devirtualizing the calls in both directions there will be a single
> SCC {S,T,X,Y}.

> digraph G {
> A -> B; B -> A; // SCC {A,B}
> S -> T; T -> S; // SCC {S,T}
> X -> Y; Y -> X; // SCC {X,Y}

> B -> X;
> B -> S;
> T -> Y [label="Ref edge that is devirtualized\nwhen visiting SCC
> {S,T}",constraint=false,style=dashed]
> X -> S [label="Ref edge that is devirtualized\nwhen visiting SCC
> {X,Y}",constraint=false,style=dashed]
> }

> (rendering at: http://reviews.llvm.org/F2073479 )

> Due to the cyclic dependence there is no SCC visitation order that
> can directly provide the invariant above. Indeed, the problem of
> maintaining the above invariant is ill-posed in this scenario.
To clarify, we're trying to provide this invariant on the "ref" graph or on the graph with direct calls only? I think the invariant need only apply to the former if we're relying on this for correctness (i.e. an analysis must visit all callees before visiting the callers). 

-Hal 

> Consider the pipeline `cgscc(function(...simplifications that can
> devirtualize...),foo-cgscc-pass)`. A possible visitation is as
> follows:

> 1. Visit SCC {S,T} and run `function(...simplifications that can
> devirtualize...)`. This reveals the call edge T->Y.
> 2. We continue visiting SCC {S,T} and run foo-cgscc-pass on SCC
> {S,T}.
> 3. Visit SCC {X,Y} and run `function(...simplifications that can
> devirtualize...)`. This reveals the call edge X->S.
> 4. ??? what do we do now.
> Alternative 4.a) Should we continue the visitation and call
> foo-cgscc-pass on "SCC" {X,Y}?
> Alternative 4.b) Should foo-cgscc-pass now run on SCC {S,T,X,Y}?
> Alternative 4.c) Should we restart the entire outer `cgscc(...)`
> visitation on SCC {S,T,X,Y}?

> (Without a cap both 4.b and 4.c could become quadratic on a graph
> like http://reviews.llvm.org/F2073607 )

> -- Sean Silva

> > thanks,
> 

> > David
> 

> > > Sean:~/pg/llvm % git grep 'public CallGraphSCCPass'
> > 
> 
> > > include/llvm/Transforms/IPO/InlinerPass.h:struct Inliner : public
> > > CallGraphSCCPass {
> > 
> 
> > > lib/Transforms/IPO/ArgumentPromotion.cpp: struct ArgPromotion :
> > > public CallGraphSCCPass {
> > 
> 
> > > lib/Transforms/IPO/FunctionAttrs.cpp:struct
> > > PostOrderFunctionAttrsLegacyPass : public CallGraphSCCPass {
> > 
> 
> > > lib/Transforms/IPO/PruneEH.cpp: struct PruneEH : public
> > > CallGraphSCCPass {
> > 
> 
> > > lib/Analysis/CallGraphSCCPass.cpp: class PrintCallGraphPass :
> > > public
> > > CallGraphSCCPass {
> > 
> 

> > > tools/opt/PassPrinters.cpp:struct CallGraphSCCPassPrinter :
> > > public
> > > CallGraphSCCPass {
> > 
> 

> > > CGSCC passes seem to have been added in what is now SVN r8247
> > > (~Aug
> > > 2003)
> > > http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20030825/006619.html
> > > (LLVM appears to have been in CVS at the time).
> > 
> 

> > > Chris, do you remember the motivation for doing the CGSCC
> > > visitation
> > > instead of a pure post-order function visitation like David is
> > > mentioning? (or was it just an oversight / hindsight-20-20
> > > thing?)
> > > Do you think it would make sense to replace CGSCC visitation with
> > > post-order function visitation in the current LLVM?
> > 
> 

> > > -- Sean Silva
> > 
> 

> > > > David
> > > 
> > 
> 

> > > > > -- Sean Silva
> > > > 
> > > 
> > 
> 

> > > > > > David
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > Sorry for the wall of text.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > -- Sean Silva
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160616/1ecf876e/attachment-0001.html>