[llvm-dev] Intended behavior of CGSCC pass manager.

Wed Jun 8 16:48:35 PDT 2016

On Wed, Jun 8, 2016 at 3:10 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On Jun 8, 2016, at 2:54 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
>
> On Wed, Jun 8, 2016 at 9:39 AM, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>>
>>
>> On Wed, Jun 8, 2016 at 4:19 AM, Sean Silva <chisophugis at gmail.com> wrote:
>>
>>> Hi Chandler, Philip, Mehdi, (and llvm-dev,)
>>>
>>> (this is partially a summary of some discussions that happened at the
>>> last LLVM bay area social, and partially a discussion about the direction
>>> of the CGSCC pass manager)
>>>
>>>
>>> A the last LLVM social we discussed the progress on the CGSCC pass
>>> manager. It seems like Chandler has a CGSCC pass manager working, but it is
>>> still unresolved exactly which semantics we want (more about this below)
>>> that are reasonably implementable.
>>>
>>> AFAICT, there has been no public discussion about what exact semantics
>>> we ultimately want to have. We should figure that out.
>>>
>>> The main difficulty which Chandler described is the apparently quite
>>> complex logic surrounding needing to run function passes nested within an
>>> SCC pass manager, while providing some guarantees about exactly what order
>>> the function passes are run. The existing CGSCC pass manager just punts on
>>> some of the problems that arise (look in CGPassManager::runOnModule,
>>> CGPassManager::RunAllPassesOnSCC, and CGPassManager::RunPassOnSCC in
>>> llvm/lib/Analysis/CallGraphSCCPass.cpp), and these are the problems that
>>> Chandler has been trying to solve.
>>>
>>> (
>>> Why is this "function passes inside CGSCC passes" stuff interesting?
>>> Because LLVM can do inlining on an SCC (often just a single function) and
>>> then run function passes to simplify the function(s) in the SCC before it
>>> tries to inline into a parent SCC. (the SCC visitation order is post-order)
>>> For example, we may inline a bunch of code, but after inlining we can
>>> tremendously simplify the function, and we want to do so before considering
>>> this function for inlining into its callers so that we get an accurate
>>> evaluation of the inline cost.
>>> Based on what Chandler said, it seems that LLVM is fairly unique in this
>>> regard and other compilers don't do this (which is why we can't just look
>>> at how other compilers solve this problem; they don't have this problem
>>> (maybe they should? or maybe we shouldn't?)). For example, he described
>>> that GCC uses different inlining "phases"; e.g. it does early inlining on
>>> the entire module, then does simplifications on the entire module, then
>>> does late inlining on the entire module; so it is not able to incrementally
>>> simplify as it inlines like LLVM does.
>>> )
>>>
>>> As background for what is below, the LazyCallGraph tracks two graphs:
>>> the "call graph" and the "ref graph".
>>> Conceptually, the call graph is the graph of direct calls, where
>>> indirect calls and calls to external functions do not appear (or are
>>> connected to dummy nodes). The ref graph is basically the graph of all
>>> functions transitively accessible based on the globals/constants/etc.
>>> referenced by a function (e.g. if a function `foo` references a vtable that
>>> is defined in the module, there is an edge in the ref graph from `foo` to
>>> every function in the vtable).
>>> The call graph is a strict subset of the ref graph.
>>>
>>> Chandler described that he had a major breakthrough in that the CGSCC
>>> pass manager only had to deal with 3 classes of modifications that can
>>> occur:
>>> - a pass may e.g. propagate a load of a function pointer into an
>>> indirect call, turning it into an direct call. This requires adding an edge
>>> in the CG but not in the ref graph.
>>> - a pass may take a direct call and turn it into an indirect call. This
>>> requires removing an edge from the CG, but not in the ref graph.
>>> - a pass may delete a direct call. This removes an edge in the CG and
>>> also in the ref graph.
>>>
>>> From the perspective of the CGSCC pass manager, these operations can
>>> affect the SCC structure. Adding an edge might merge SCC's and deleting an
>>> edge might split SCC's. Chandler mentioned that apparently the issues of
>>> splitting and merging SCC's within the current infrastructure are actually
>>> quite challenging and lead to e.g. iterator invalidation issues, and that
>>> is what he is working on.
>>>
>>> (
>>> The ref graph is important to guide the overall SCC visitation order
>>> because it basically represents "the largest graph that the CG may turn
>>> into due to our static analysis of this module". I.e. no transformation we
>>> can statically make in the CGSCC passes can ever cause us to need to merge
>>> SCC's in the ref graph.
>>> )
>>>
>>>
>>>
>>> I have a couple overall questions/concerns:
>>>
>>>
>>> 1. The ref graph can easily go quadratic. E.g.
>>>
>>> typedef void (*fp)();
>>> fp funcs[] = {
>>>   &foo1,
>>>   &foo2,
>>>   ...
>>>   &fooN
>>> }
>>> void foo1() { funcs[something](); }
>>> void foo2() { funcs[something](); }
>>> ...
>>> void fooN() { funcs[something](); }
>>>
>>> One real-world case where this might come about is in the presence of
>>> vtables.
>>>
>>> The existing CGSCC pass manager does not have this issue AFAIK because
>>> it does not consider the ref graph.
>>>
>>> Does anybody have any info/experience about how densely connected the
>>> ref graph can get in programs that might reasonably be fed to the compiler?
>>>
>>
>>
>> I can state that almost all call graphs of compilers include edges for
>> indirect calls and external functions, so they are already quadratic in
>> this sense.
>>
>> if what you state is correct, and we don't have a conservatively correct
>> call graph, that would be ... interesting.
>>
>
> Mehdi clarified downthread the situation (at least for me).
>
> Now that I look more closely at the comments in
> https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/LazyCallGraph.h
>  it intentionally does not model a call graph in this sense. I.e.
> - in a traditional CG, an edge means something like "at runtime there may
> be a call from A->B"
> - in the LazyCallGraph an edge (a "ref edge" as it calls it) represents
> something like "during optimization of this module, we may discover the
> existence of a direct call from A->B". There is also a distinguished
> subgraph of the ref graph (which I think LazyCallGraph calls just the "call
> graph") which represents the actual direct calls that are present in the
> module currently.
>
> The comments in LazyCallGraph.h are quite good, but the existing
> CallGraph.h doesn't seem to touch on this up front in its comments in quite
> the same way, but it does at least say that it models with two external
> nodes like Mehdi mentioned:
>
> https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/CallGraph.h#L30
> So its edges don't represent "at runtime there may be a call from A->B".
> But since it doesn't maintain a "ref graph" I'm not sure what the edges
> exactly represent.
>
>
>
> I thought of it as the edges are "there is a direct call from A -> B".
> Which is a subset of "at runtime there may be a call from A->B".
>
> I think that with all this discussion, it is important to distinguish that
> (I think) there is no "correctness" issue at stance (we won't miscompile
> anything)
>

This is a good point worth explicitly noting (and hopefully there are in
fact no passes relying on it to mean "at runtime there may be a call from
A->B").

> , but there may be missing optimization in some cases. I think the current
> scheme catches most cases and when it does not we are just missing
> potential inlining. The question may be how much (more) cases we really
> need to catch with the new pass manager?
>

I think this only affects inlining of mutually recursive functions. Most
functions are not mutually recursive [citation needed] so I'm not too
worried.
(The main performance-critical case that I can think of using mutual
recursion would be parsers).

-- Sean Silva

> And could a first implementation not catch everything and be improved
> incrementally? This comes back somehow to what Hal was mentioning
> (reproducing the current behavior before improving it).
>
> --
> Mehdi
>
>
>
>
>
>
>
>> The solution to most issues (large sccs, etc) that exist here that most
>> other compilers take is to try to make the call graph more precise, not to
>> avoid indirect/external calls in the call graph.
>>
>> In turn, this means the solution often take is to not have two graphs at
>> all.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/793d0aff/attachment.html>