[llvm-dev] Intended behavior of CGSCC pass manager.

Wed Jun 8 14:52:14 PDT 2016

> On Jun 8, 2016, at 12:36 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> 
> Hi,
> 
> Does it make sense to change RefSCCs to hold a list of
> RefSCC-DAG-Roots that were split out of it because of edge deletion?
> Then one way to phrase the inliner/function pass iteration would be
> (assuming I understand the issues):
> 
>  Stack.push(RefSCC_Leaves);
>  while (!Stack.empty()) {
>    RefSCC = Stack.pop();
>    InlineCallSites(RefSCC);
>    if (!RefSCC.splitOutSCCs.empty())
>      goto repush;
>    for each func in RefSCC:
>      FPM.run(fund);

I'm not sure what you mean above, but IIUC I think it is a bit more complex since the inliner runs at the same place you put FPM.run().

-- 
Mehdi

>      if (!RefSCC.splitOutSCCs.empty())
>        goto repush;
>    continue;
> repush:
>     for (refscc_dag_root in RefSCCs.splitOutSCCs)
>       // here we don't want to push every leaf, but leafs that
>       // have functions that haven't had the FPM run on (maybe we can
> do this by maintaining a set?)
>       // if we don't push a leaf, we push its parent (which we want
> to push even if we've run FPM on it
>       // since we'd like to re-run the inliner on it).
>       refscc_dag_root.push_leaves_to(Stack)
>   }
> 
> 
> (I know this isn't ideal, since now RefSCC is no longer "just a data
> structure", but is actually has incidental information.)
> 
> 
> On Wed, Jun 8, 2016 at 12:07 PM, Finkel, Hal J. via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Sent from my Verizon Wireless 4G LTE DROID
>> 
>> On Jun 8, 2016 1:58 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>>> 
>>> 
>>>> On Jun 8, 2016, at 9:32 AM, Hal Finkel via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>> 
>>>> 
>>>> ________________________________
>>>>> 
>>>>> From: "Sean Silva via llvm-dev" <llvm-dev at lists.llvm.org>
>>>>> To: "llvm-dev" <llvm-dev at lists.llvm.org>
>>>>> Sent: Wednesday, June 8, 2016 6:19:03 AM
>>>>> Subject: [llvm-dev] Intended behavior of CGSCC pass manager.
>>>>> 
>>>>> Hi Chandler, Philip, Mehdi, (and llvm-dev,)
>>>>> 
>>>>> (this is partially a summary of some discussions that happened at the
>>>>> last LLVM bay area social, and partially a discussion about the direction of
>>>>> the CGSCC pass manager)
>>>>> 
>>>>> 
>>>>> A the last LLVM social we discussed the progress on the CGSCC pass
>>>>> manager. It seems like Chandler has a CGSCC pass manager working, but it is
>>>>> still unresolved exactly which semantics we want (more about this below)
>>>>> that are reasonably implementable.
>>>>> 
>>>>> AFAICT, there has been no public discussion about what exact semantics
>>>>> we ultimately want to have. We should figure that out.
>>>>> 
>>>>> The main difficulty which Chandler described is the apparently quite
>>>>> complex logic surrounding needing to run function passes nested within an
>>>>> SCC pass manager, while providing some guarantees about exactly what order
>>>>> the function passes are run. The existing CGSCC pass manager just punts on
>>>>> some of the problems that arise (look in CGPassManager::runOnModule,
>>>>> CGPassManager::RunAllPassesOnSCC, and CGPassManager::RunPassOnSCC in
>>>>> llvm/lib/Analysis/CallGraphSCCPass.cpp), and these are the problems that
>>>>> Chandler has been trying to solve.
>>>>> 
>>>>> (
>>>>> Why is this "function passes inside CGSCC passes" stuff interesting?
>>>>> Because LLVM can do inlining on an SCC (often just a single function) and
>>>>> then run function passes to simplify the function(s) in the SCC before it
>>>>> tries to inline into a parent SCC. (the SCC visitation order is post-order)
>>>>> For example, we may inline a bunch of code, but after inlining we can
>>>>> tremendously simplify the function, and we want to do so before considering
>>>>> this function for inlining into its callers so that we get an accurate
>>>>> evaluation of the inline cost.
>>>>> Based on what Chandler said, it seems that LLVM is fairly unique in this
>>>>> regard and other compilers don't do this (which is why we can't just look at
>>>>> how other compilers solve this problem; they don't have this problem (maybe
>>>>> they should? or maybe we shouldn't?)). For example, he described that GCC
>>>>> uses different inlining "phases"; e.g. it does early inlining on the entire
>>>>> module, then does simplifications on the entire module, then does late
>>>>> inlining on the entire module; so it is not able to incrementally simplify
>>>>> as it inlines like LLVM does.
>>>> 
>>>> This incremental simplification is an important feature of our inliner,
>>>> and one we should endeavor to keep. We might also want different phases at
>>>> some point (e.g. a top-down and a bottom-up phase), but that's another
>>>> story.
>>>>> 
>>>>> )
>>>>> 
>>>>> As background for what is below, the LazyCallGraph tracks two graphs:
>>>>> the "call graph" and the "ref graph".
>>>>> Conceptually, the call graph is the graph of direct calls, where
>>>>> indirect calls and calls to external functions do not appear (or are
>>>>> connected to dummy nodes). The ref graph is basically the graph of all
>>>>> functions transitively accessible based on the globals/constants/etc.
>>>>> referenced by a function (e.g. if a function `foo` references a vtable that
>>>>> is defined in the module, there is an edge in the ref graph from `foo` to
>>>>> every function in the vtable).
>>>>> The call graph is a strict subset of the ref graph.
>>>>> 
>>>>> Chandler described that he had a major breakthrough in that the CGSCC
>>>>> pass manager only had to deal with 3 classes of modifications that can
>>>>> occur:
>>>>> - a pass may e.g. propagate a load of a function pointer into an
>>>>> indirect call, turning it into an direct call. This requires adding an edge
>>>>> in the CG but not in the ref graph.
>>>>> - a pass may take a direct call and turn it into an indirect call. This
>>>>> requires removing an edge from the CG, but not in the ref graph.
>>>>> - a pass may delete a direct call. This removes an edge in the CG and
>>>>> also in the ref graph.
>>>>> 
>>>>> From the perspective of the CGSCC pass manager, these operations can
>>>>> affect the SCC structure. Adding an edge might merge SCC's and deleting an
>>>>> edge might split SCC's. Chandler mentioned that apparently the issues of
>>>>> splitting and merging SCC's within the current infrastructure are actually
>>>>> quite challenging and lead to e.g. iterator invalidation issues, and that is
>>>>> what he is working on.
>>>>> 
>>>>> (
>>>>> The ref graph is important to guide the overall SCC visitation order
>>>>> because it basically represents "the largest graph that the CG may turn into
>>>>> due to our static analysis of this module". I.e. no transformation we can
>>>>> statically make in the CGSCC passes can ever cause us to need to merge SCC's
>>>>> in the ref graph.
>>>>> )
>>>>> 
>>>>> 
>>>>> 
>>>>> I have a couple overall questions/concerns:
>>>>> 
>>>>> 
>>>>> 1. The ref graph can easily go quadratic. E.g.
>>>>> 
>>>>> typedef void (*fp)();
>>>>> fp funcs[] = {
>>>>>  &foo1,
>>>>>  &foo2,
>>>>>  ...
>>>>>  &fooN
>>>>> }
>>>>> void foo1() { funcs[something](); }
>>>>> void foo2() { funcs[something](); }
>>>>> ...
>>>>> void fooN() { funcs[something](); }
>>>>> 
>>>>> One real-world case where this might come about is in the presence of
>>>>> vtables.
>>>>> 
>>>>> The existing CGSCC pass manager does not have this issue AFAIK because
>>>>> it does not consider the ref graph.
>>>>> 
>>>>> Does anybody have any info/experience about how densely connected the
>>>>> ref graph can get in programs that might reasonably be fed to the compiler?
>>>>> I just did a quick sanity check with LLD/ELF using
>>>>> `--lto-newpm-passes=cgscc(no-op-cgscc)` and it at least seemed to terminate
>>>>> / not run out of memory. Based on some rough calculations looking at the
>>>>> profile, it seem like the entire run of the inliner in the old LTO pipeline
>>>>> (which is about 5% of total LTO time on this particular example I looked at)
>>>>> is only 2-3x as expensive as just `--lto-newpm-passes=cgscc(no-op-cgscc)`,
>>>>> so the LazyCallGraph construction is certainly not free.
>>>>> 
>>>>> 
>>>>> 2. What is the intended behavior of CGSCC passes when SCC's are split or
>>>>> merged? E.g. a CGSCC pass runs on an SCC (e.g. the inliner). Now we run some
>>>>> function passes nested inside the CGSCC pass manager (e.g. to simplify
>>>>> things after inlining). Consider:
>>>> 
>>>> This is not how I thought the current scheme worked ;) -- I was under the
>>>> impression that we had a call graph with conservatively-connected dummy
>>>> nodes for external/indirect functions.
>>> 
>>> 
>>> The fact that we have a separate nodes for calling into an external
>>> function and "being called" from an external function, these don't form SCC.
>>> So it means we can end up merging SCC IIUC.
>> 
>> Yes, although I thought there was only one dummy node for those things.
>> 
>> -Hal
>> 
>>> 
>>> --
>>> Mehdi
>>> 
>>> 
>>> 
>>>> As a result, there is no semantics-preserving optimization that will
>>>> merge SCCs, only split them. In that case, I'd expect that once an SCC is
>>>> split, we re-run the CGSCC passes over the newly-separated SCCs. But this
>>>> corresponds to running over the "ref graph", as you describe it. I don't
>>>> understand why we want the non-conservative graph.
>>>> 
>>>>> 
>>>>> a) These function passes are e.g. now able to devirtualize a call,
>>>>> adding an edge to the CG, forming a larger CG SCC. Do you re-run the CGSCC
>>>>> pass (say, the inliner) on this larger SCC?
>>>>> 
>>>>> b) These function passes are e.g. able to DCE a call, removing an edge
>>>>> from the CG. This converts, say, a CG SCC which is a cycle graph (like
>>>>> a->b->c->a) into a path graph (a->b->c, with no edge back to a). The inliner
>>>>> had already visited a, b, and c as a single SCC. Now does it have to
>>>>> re-visit c, then b, then a, as single-node SCC's?
>>>>> 
>>>>> 
>>>>> btw:
>>>>> 
>>>>> One way that I have found it useful to think about this is in terms of
>>>>> the visitation during Tarjan's SCC algorithm. I'll reference the pseudocode
>>>>> in
>>>>> https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm.
>>>>> Inside the "strongconnect" routine when we have identified an SCC (the true
>>>>> branch of `if (v.lowlink = v.index)` test ) we can visit
>>>>> stack[v.index:stack.size()] as an SCC. This may or may not invalidate some
>>>>> things on the stack (the variable `S` in the pseudocode) and we may need to
>>>>> fix it up (e.g. inlining deleted a function, so we can't have an entry on
>>>>> the stack). Then, we can run function passes as we pop individual functions
>>>>> off the stack, but it is easier to think about IMO than merging of SCC data
>>>>> structures: if we add edges to the CG then we have to do more DFS on the new
>>>>> edges and if we delete edges then the DFS order of the stack gives us
>>>>> certain guarantees.
>>>>> Personally I find this much easier to reason about than the description
>>>>> in terms of splitting and merging SCC's in the CG and ref graph (which the
>>>>> LazyCallGraph API makes one to think about since it hides the underlying
>>>>> Tarjan's algorithm).
>>>>> The LazyCallGraph API makes the current loop in
>>>>> http://reviews.llvm.org/diffusion/L/browse/llvm/trunk/include/llvm/Analysis/CGSCCPassManager.h;272124$100
>>>>> very clean, but at least for my thinking about the problem, it seems like
>>>>> the wrong abstraction (and most of the LazyCallGraph API seems to be unused,
>>>>> so it seems like it may be overly heavyweight).
>>>>> E.g. I think that maybe the easiest thing to do is to turn the current
>>>>> approach inside out: instead of having the pass manager logic be the "normal
>>>>> code" and forcing the Tarjan algorithm to become a state machine of
>>>>> iterators, use an open-coded Tarjan algorithm with some callbacks and make
>>>>> the pass management logic be the state machine.
>>>>> This will also open the door to avoiding the potentially quadratic size
>>>>> of the ref graph, since e.g. in the example I gave above, we can mark the
>>>>> `funcs` array itself as already having been visited during the walk. In the
>>>>> current LazyCallGraph, this would require adding some sort of notion of
>>>>> hyperedge.
>>>> 
>>>> FWIW, I see no purpose in abstracting one algorithm, especially if that
>>>> makes things algorithmically harder. Also, the LazyCallGraph abstraction and
>>>> the iterator abstraction seem like separate issues. Iterator abstractions
>>>> are often useful because you can use them in generic algorithms, etc.
>>>>> 
>>>>> 
>>>>> Since this is such a high priority (due to blocking PGO inlining), I
>>>>> will probably try my hand at implementing the CGSCC pass manager sometime
>>>>> soon unless somebody beats me to it. (I'll probably try the "open-coded SCC
>>>>> visit" approach).
>>>>> 
>>>>> Another possibility is implementing the new CGSCC pass manager that uses
>>>>> the same visitation semantics as the one in the old PM, and then we can
>>>>> refactor that as needed. In fact, that may be the best approach so that
>>>>> porting to the new PM is as NFC as possible and we can isolate the
>>>>> functional (i.e., need benchmarks, measurements ...) changes in separate
>>>>> commits.
>>>> 
>>>> I'm in favor of this approach for exactly the reason you mention. Being
>>>> able to bisect regressions to the algorithmic change, separate from the
>>>> infrastructure change, will likely make things easier in the long run (and
>>>> will avoid the problem, to the extent possible, of performance regressions
>>>> blocking the pass-manager work).
>>>>> 
>>>>> 
>>>>> Sorry for the wall of text.
>>>> 
>>>> No problem; much appreciated.
>>>> 
>>>> -Hal
>>>> 
>>>>> 
>>>>> -- Sean Silva
>>>>> 
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
> 
> 
> 
> -- 
> Sanjoy Das
> http://playingwithpointers.com