[llvm-dev] Intended behavior of CGSCC pass manager.

Fri Jun 17 00:27:25 PDT 2016

On Fri, Jun 17, 2016 at 12:10 AM, Xinliang David Li <davidxl at google.com>
wrote:

>
>
> On Thu, Jun 16, 2016 at 11:51 PM, Sean Silva <chisophugis at gmail.com>
> wrote:
>
>>
>>
>> On Thu, Jun 16, 2016 at 11:07 PM, Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Jun 16, 2016 at 10:47 PM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Jun 16, 2016 at 9:53 PM, Xinliang David Li <davidxl at google.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 16, 2016 at 11:12 AM, Sanjoy Das <
>>>>> sanjoy at playingwithpointers.com> wrote:
>>>>>
>>>>>> Hi Sean,
>>>>>>
>>>>>> On Thu, Jun 16, 2016 at 4:48 AM, Sean Silva via llvm-dev
>>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>> > One question is what invariants we want to provide for the
>>>>>> visitation.
>>>>>> >
>>>>>> > For example, should a CGSCC pass be able to assume that all "child"
>>>>>> SCC's
>>>>>> > (SCC's it can reach via direct calls emanating from the SCC being
>>>>>> visited)
>>>>>> > have already been visited? Currently I don't think it can, and IIRC
>>>>>> from the
>>>>>> > discussion at the social this is one thing that Chandler is hoping
>>>>>> to fix.
>>>>>> > The "ref edge" notion in LazyCallGraph ensures that we cannot
>>>>>> create a call
>>>>>> > edge (devirtualizing a ref edge) that will point at an SCC that has
>>>>>> not yet
>>>>>> > been visited.
>>>>>> >
>>>>>> > E.g. consider this graph:
>>>>>> >
>>>>>> > digraph G {
>>>>>> >   A -> B; B -> A; // SCC {A,B}
>>>>>> >   S -> T; T -> S; // SCC {S,T}
>>>>>> >   X -> Y; Y -> X; // SCC {X,Y}
>>>>>> >
>>>>>> >   B -> X;
>>>>>> >   B -> S;
>>>>>> >   T -> Y [label="Ref edge that is devirtualized\nwhen visiting SCC
>>>>>> > {S,T}",constraint=false,style=dashed]
>>>>>> > }
>>>>>> >
>>>>>> > (can visualize conveniently at http://www.webgraphviz.com/ or I
>>>>>> have put an
>>>>>> > image at http://reviews.llvm.org/F2073104)
>>>>>> >
>>>>>> > If we do not consider the ref graph, then it is possible for SCC
>>>>>> {S,T} to be
>>>>>>
>>>>>> I'm not sure why you wouldn't consider the ref graph?  I think the
>>>>>> general idea is to visit RefSCCs in bottom up order, and when visiting
>>>>>> a RefSCC, visiting the SCC's inside the RefSCC in bottom up order.
>>>>>>
>>>>>> So in your example, given the edges you've shown, we will visit {X,Y}
>>>>>> before visiting {S,T}.
>>>>>>
>>>>>> > A more complicated case is when SCC {S,T} and SCC {X,Y} both call
>>>>>> into each
>>>>>> > other via function pointers. So eventually after devirtualizing the
>>>>>> calls in
>>>>>> > both directions there will be a single SCC {S,T,X,Y}.
>>>>>> >
>>>>>> > digraph G {
>>>>>> >   A -> B; B -> A; // SCC {A,B}
>>>>>> >   S -> T; T -> S; // SCC {S,T}
>>>>>> >   X -> Y; Y -> X; // SCC {X,Y}
>>>>>> >
>>>>>> >   B -> X;
>>>>>> >   B -> S;
>>>>>> >   T -> Y [label="Ref edge that is devirtualized\nwhen visiting SCC
>>>>>> > {S,T}",constraint=false,style=dashed]
>>>>>> >   X -> S [label="Ref edge that is devirtualized\nwhen visiting SCC
>>>>>> > {X,Y}",constraint=false,style=dashed]
>>>>>> > }
>>>>>> >
>>>>>> > (rendering at: http://reviews.llvm.org/F2073479)
>>>>>> >
>>>>>> > Due to the cyclic dependence there is no SCC visitation order that
>>>>>> can
>>>>>> > directly provide the invariant above. Indeed, the problem of
>>>>>> maintaining the
>>>>>>
>>>>>> I think the workflow in the above will (roughly) be:
>>>>>>
>>>>>>  Visit the RefSCC {X,Y,S,T}
>>>>>>
>>>>>
>>>>> Are we sure RefSCC has ref edges between {X, Y} and {S, T} in this
>>>>> case?  I may miss the code handling it.
>>>>>
>>>>
>>>> The ref graph is conservative and so it would have the appropriate
>>>> edges.
>>>>
>>>> https://github.com/llvm-project/llvm-project/blob/master/llvm/lib/Analysis/LazyCallGraph.cpp#L90
>>>>
>>>
>>>
>>> That is where the confusion is -- the findReferences's function body
>>> only handles constant operands which may be function addresses.  Did I miss
>>> something obvious?
>>>
>>
>> It is somewhat subtle. There are 3 potential meanings of "call graph" in
>> this thread:
>> 1. The graph of *direct calls* in the current module. (this is mutated
>> during optimization)
>> 2. A supergraph of 1., conservatively chosen such 1. remains a subgraph
>> under arbitrary semantics-preserving function transformations (note: 2. and
>> 1. must be updated in sync during deletion of functions).
>> 3. The conservative graph representing all edges which may exist *at
>> runtime*.
>>
>> LazyCallGraph models 1. (the "call graph") and 2. (the "ref graph"). It
>> does not model 3.
>> The search for constants in findReferences guarantees that we find (a
>> conserative superset of) all call destinations addresses that
>> transformations may add direct calls (and hence update 1.).
>>
>> The existing CallGraph data structure (used by e.g. the old PM CGSCC
>> visitation) only models 1.
>>
>>
>>
>>>   Another point is that it may not be practical to model edges to
>>> indirect targets. For virtual calls, without CHA, each virtual callsite
>>> will end up referencing all potential address taken functions.
>>>
>>
>> In this statement, you are thinking in terms of 3.
>> Note that in both 1. and 2. the indirect calls are modeled as calling an
>> external dummy node. This dummy node is *distinct* from a second external
>> dummy node that calls all address-taken functions. (hence the indirect
>> calls do not end up forming a RefSCC with all address taken functions).
>>
>
>
> For C++ programs, most ref edges are probably from ctors and dtors that
> reference vtable -- they will have large fan-out of ref edges to virtual
> member functions they are unlikely to call.  In other words, such ref edges
> mostly do not represent real caller->callee relationship, so I wonder what
> is the advantage of forming refSCC from SCCs?
>

I believe it is primarily used for ordering the visitation of CallSCC's
(i.e. SCC's in the "call graph").

-- Sean Silva

>
> David
>
>
>
>> Note that `opt -dot-callgraph` can produce a graphviz file for the old PM
>> callgraph. I'm not sure if we have one for LazyCallGraph (which has both
>> ref graph and call graph); I'll post a patch adding one if not.
>>
>> -- Sean Silva
>>
>>
>>
>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>>    Visit the SCC {X,Y} // arbitrary
>>>>>>      Optimize({X,Y})
>>>>>>      // Now there's an edge to {S,T}, invalidate
>>>>>>      // the analyses cached for {X,Y} and visit {S,T}
>>>>>>
>>>>>
>>>>> I am not sure if this makes sense.   If dynamically, the call edge
>>>>> from {X, Y} to {S, T} does exist, but not discovered by the analysis, then
>>>>> the cached {X, Y} will still be invalid, but who is going to invalidate it?
>>>>>
>>>>
>>>> I assume that if dynamically there was a call from {X,Y} to {S,T}, then
>>>> the analysis would have observed an indirect call and would have behaved
>>>> conservatively.
>>>>
>>>
>>> See above, I did not see how indirect call is handled. Also if the
>>> result for {X, Y} is conservatively correct before the direct call edge is
>>> discovered, why bother invalidating its analysis when the call edge is
>>> exposed?
>>>
>>> David
>>>
>>>
>>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>    Visit the SCC {S,T}
>>>>>>      Optimize({S,T})
>>>>>>      // Now {X,Y,S,T} collapses to form a single SCC
>>>>>>    Visit the SCC {S,T,X,Y}
>>>>>>      Optimize({S,T,X,Y})
>>>>>>
>>>>>> The difficult bit is to make the inner "// Now.*" bits work well.
>>>>>>
>>>>>> -- Sanjoy
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160617/faf9638a/attachment.html>