[llvm-dev] Intended behavior of CGSCC pass manager.

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Wed Jun 8 15:22:48 PDT 2016

On Wed, Jun 8, 2016 at 10:57 AM, Xinliang David Li <davidxl at google.com>

> On Wed, Jun 8, 2016 at 4:19 AM, Sean Silva <chisophugis at gmail.com> wrote:
>> Hi Chandler, Philip, Mehdi, (and llvm-dev,)
>> (this is partially a summary of some discussions that happened at the
>> last LLVM bay area social, and partially a discussion about the direction
>> of the CGSCC pass manager)
>> A the last LLVM social we discussed the progress on the CGSCC pass
>> manager. It seems like Chandler has a CGSCC pass manager working, but it is
>> still unresolved exactly which semantics we want (more about this below)
>> that are reasonably implementable.
>> AFAICT, there has been no public discussion about what exact semantics we
>> ultimately want to have. We should figure that out.
>> The main difficulty which Chandler described is the apparently quite
>> complex logic surrounding needing to run function passes nested within an
>> SCC pass manager, while providing some guarantees about exactly what order
>> the function passes are run. The existing CGSCC pass manager just punts on
>> some of the problems that arise (look in CGPassManager::runOnModule,
>> CGPassManager::RunAllPassesOnSCC, and CGPassManager::RunPassOnSCC in
>> llvm/lib/Analysis/CallGraphSCCPass.cpp), and these are the problems that
>> Chandler has been trying to solve.
>> (
>> Why is this "function passes inside CGSCC passes" stuff interesting?
>> Because LLVM can do inlining on an SCC (often just a single function) and
>> then run function passes to simplify the function(s) in the SCC before it
>> tries to inline into a parent SCC. (the SCC visitation order is post-order)
>> For example, we may inline a bunch of code, but after inlining we can
>> tremendously simplify the function, and we want to do so before considering
>> this function for inlining into its callers so that we get an accurate
>> evaluation of the inline cost.
>> Based on what Chandler said, it seems that LLVM is fairly unique in this
>> regard and other compilers don't do this (which is why we can't just look
>> at how other compilers solve this problem; they don't have this problem
>> (maybe they should? or maybe we shouldn't?)). For example, he described
>> that GCC uses different inlining "phases"; e.g. it does early inlining on
>> the entire module, then does simplifications on the entire module, then
>> does late inlining on the entire module; so it is not able to incrementally
>> simplify as it inlines like LLVM does.
>> )
> I want to clarify a little more on GCC's behavior.  GCC has two types of
> ipa_passes. One called simple_ipa pass and the other is called regular
> ipa_pass.   A simple IPA pass does not transform IR by itself, but it can
> have function level sub-passes that does transformation. If it has function
> sub-passes, the function passes will be executed in bottom up order just
> like LLVM's CGSCC pass.   A regular ipa pass is somewhat like the module
> pass in LLVM.
> GCC's pass pipeline is actually similar to LLVM's pipeline overall: it has
> the following components:
> (1) lowering passes
> (2) small/simple IPA passes including the early optimization pipeline
> (bottom-up)
> (3) full ipa pipelines
> (4) post-ipa optimization and code generation.
> The difference is that LLVM's (2) includes only inst-combine and simplfy
> CFG, but GCC's (2) is larger which happens also includes an early inlining
> pass.

So the early inlining is a function pass? In LLVM I think that the function
pass contract would not allow inlining to occur in a function pass (the
contract is theoretically designed to allow function passes to run
concurrently on functions within a module).

>   GCC's (2) is similar to LLVM's CGSCC pass but with fewer passes. The
> inliner is also targeting tiny functions that reduces size overall.
> LLVM's LTO pipeline is kind of like this too.
> GCC's (3) includes a full blown regular inlining pass. This inliner uses
> the priority order based algorithm, so bottom-up traversal does not make
> sense. It requires (2) to have pass to collect summary to drive the
> decisions (as well as profile data).

Thanks for this clarification; it is always good to know what other
compilers do for comparison.

-- Sean Silva

> David
>> As background for what is below, the LazyCallGraph tracks two graphs: the
>> "call graph" and the "ref graph".
>> Conceptually, the call graph is the graph of direct calls, where indirect
>> calls and calls to external functions do not appear (or are connected to
>> dummy nodes). The ref graph is basically the graph of all functions
>> transitively accessible based on the globals/constants/etc. referenced by a
>> function (e.g. if a function `foo` references a vtable that is defined in
>> the module, there is an edge in the ref graph from `foo` to every function
>> in the vtable).
>> The call graph is a strict subset of the ref graph.
>> Chandler described that he had a major breakthrough in that the CGSCC
>> pass manager only had to deal with 3 classes of modifications that can
>> occur:
>> - a pass may e.g. propagate a load of a function pointer into an indirect
>> call, turning it into an direct call. This requires adding an edge in the
>> CG but not in the ref graph.
>> - a pass may take a direct call and turn it into an indirect call. This
>> requires removing an edge from the CG, but not in the ref graph.
>> - a pass may delete a direct call. This removes an edge in the CG and
>> also in the ref graph.
>> From the perspective of the CGSCC pass manager, these operations can
>> affect the SCC structure. Adding an edge might merge SCC's and deleting an
>> edge might split SCC's. Chandler mentioned that apparently the issues of
>> splitting and merging SCC's within the current infrastructure are actually
>> quite challenging and lead to e.g. iterator invalidation issues, and that
>> is what he is working on.
>> (
>> The ref graph is important to guide the overall SCC visitation order
>> because it basically represents "the largest graph that the CG may turn
>> into due to our static analysis of this module". I.e. no transformation we
>> can statically make in the CGSCC passes can ever cause us to need to merge
>> SCC's in the ref graph.
>> )
>> I have a couple overall questions/concerns:
>> 1. The ref graph can easily go quadratic. E.g.
>> typedef void (*fp)();
>> fp funcs[] = {
>>   &foo1,
>>   &foo2,
>>   ...
>>   &fooN
>> }
>> void foo1() { funcs[something](); }
>> void foo2() { funcs[something](); }
>> ...
>> void fooN() { funcs[something](); }
>> One real-world case where this might come about is in the presence of
>> vtables.
>> The existing CGSCC pass manager does not have this issue AFAIK because it
>> does not consider the ref graph.
>> Does anybody have any info/experience about how densely connected the ref
>> graph can get in programs that might reasonably be fed to the compiler?
>> I just did a quick sanity check with LLD/ELF using
>> `--lto-newpm-passes=cgscc(no-op-cgscc)` and it at least seemed to terminate
>> / not run out of memory. Based on some rough calculations looking at the
>> profile, it seem like the entire run of the inliner in the old LTO pipeline
>> (which is about 5% of total LTO time on this particular example I looked
>> at) is only 2-3x as expensive as just
>> `--lto-newpm-passes=cgscc(no-op-cgscc)`, so the LazyCallGraph construction
>> is certainly not free.
>> 2. What is the intended behavior of CGSCC passes when SCC's are split or
>> merged? E.g. a CGSCC pass runs on an SCC (e.g. the inliner). Now we run
>> some function passes nested inside the CGSCC pass manager (e.g. to simplify
>> things after inlining). Consider:
>> a) These function passes are e.g. now able to devirtualize a call, adding
>> an edge to the CG, forming a larger CG SCC. Do you re-run the CGSCC pass
>> (say, the inliner) on this larger SCC?
>> b) These function passes are e.g. able to DCE a call, removing an edge
>> from the CG. This converts, say, a CG SCC which is a cycle graph (like
>> a->b->c->a) into a path graph (a->b->c, with no edge back to a). The
>> inliner had already visited a, b, and c as a single SCC. Now does it have
>> to re-visit c, then b, then a, as single-node SCC's?
>> btw:
>> One way that I have found it useful to think about this is in terms of
>> the visitation during Tarjan's SCC algorithm. I'll reference the pseudocode
>> in
>> https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm.
>> Inside the "strongconnect" routine when we have identified an SCC (the true
>> branch of `if (v.lowlink = v.index)` test ) we can visit
>> stack[v.index:stack.size()] as an SCC. This may or may not invalidate some
>> things on the stack (the variable `S` in the pseudocode) and we may need to
>> fix it up (e.g. inlining deleted a function, so we can't have an entry on
>> the stack). Then, we can run function passes as we pop individual functions
>> off the stack, but it is easier to think about IMO than merging of SCC data
>> structures: if we add edges to the CG then we have to do more DFS on the
>> new edges and if we delete edges then the DFS order of the stack gives us
>> certain guarantees.
>> Personally I find this much easier to reason about than the description
>> in terms of splitting and merging SCC's in the CG and ref graph (which the
>> LazyCallGraph API makes one to think about since it hides the underlying
>> Tarjan's algorithm).
>> The LazyCallGraph API makes the current loop in
>> http://reviews.llvm.org/diffusion/L/browse/llvm/trunk/include/llvm/Analysis/CGSCCPassManager.h;272124$100
>> very clean, but at least for my thinking about the problem, it seems like
>> the wrong abstraction (and most of the LazyCallGraph API seems to be
>> unused, so it seems like it may be overly heavyweight).
>> E.g. I think that maybe the easiest thing to do is to turn the current
>> approach inside out: instead of having the pass manager logic be the
>> "normal code" and forcing the Tarjan algorithm to become a state machine of
>> iterators, use an open-coded Tarjan algorithm with some callbacks and make
>> the pass management logic be the state machine.
>> This will also open the door to avoiding the potentially quadratic size
>> of the ref graph, since e.g. in the example I gave above, we can mark the
>> `funcs` array itself as already having been visited during the walk. In the
>> current LazyCallGraph, this would require adding some sort of notion of
>> hyperedge.
>> Since this is such a high priority (due to blocking PGO inlining), I will
>> probably try my hand at implementing the CGSCC pass manager sometime soon
>> unless somebody beats me to it. (I'll probably try the "open-coded SCC
>> visit" approach).
>> Another possibility is implementing the new CGSCC pass manager that uses
>> the same visitation semantics as the one in the old PM, and then we can
>> refactor that as needed. In fact, that may be the best approach so that
>> porting to the new PM is as NFC as possible and we can isolate the
>> functional (i.e., need benchmarks, measurements ...) changes in separate
>> commits.
>> Sorry for the wall of text.
>> -- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/77063a75/attachment-0001.html>

More information about the llvm-dev mailing list