[PATCH] D21464: [PM] WIP: Introduce basic update capabilities to the new PM's CGSCC pass manager, including both plumbing and logic to handle function pass updates.

Wed Jun 29 21:37:23 PDT 2016

On Wed, Jun 29, 2016 at 5:42 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:

> On Wed, Jun 29, 2016 at 1:16 AM Sean Silva via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> On Wed, Jun 29, 2016 at 12:03 AM, Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>> On Tue, Jun 28, 2016 at 5:41 PM, Chandler Carruth <chandlerc at gmail.com>
>>> wrote:
>>>
>>>> As an example of the kind of missed optimizations, I'm going to add a
>>>> test case where we successfully deduce readnone with the new pass manager
>>>> but can't (no matter how many iterations we make) deduce it with the old
>>>> one. This is because we don't refine the SCC graph structure, and do
>>>> principled visitation to that graph structure.
>>>>
>>>
>>> I saw the test case. The question is there anything preventing SCC graph
>>> be refined for the second iteration in the old framework?
>>>
>>
>> I tested this with my proof of concept port of the old visit order to the
>> new pass manager [*]. With the following tiny modification of
>> RefreshCallGraph, the port passes both of Chandler's tests:
>> http://reviews.llvm.org/P6012
>>
>
> As you mentioned, I had over-simplified my test cases, and they weren't
> testing what I was describing in email, sorry about that. They no longer
> required a *refined* SCC graph.
>
> I've updated the test cases to actually check the refined SCC graph, and
> even with the patch you mentioned, the old pass manager can't figure this
> out.
>
> Of course, there is a way to get this with the old pass manager. You have
> to re-build the call graph. But that requires a full re-run of the
> bottom-up walk. You can see this by doing 'opt -functionattrs -simplifycfg
> -functionattrs' which will get this case, but will build an entirely new
> call graph to do so.
>
>
> What I'm trying to point out is a fundamental limitation that is addressed
> by doing on-line update and refinement of the graph without re-starting the
> entire pipeline. Supporting that + identity based caching for the new pass
> manager were the two driving motivations for the design of the
> LazyCallGraph and I continue to think having this functionality is really
> essential.
>
>
>> In fact, I even had to modify cgscc-observe-devirt.ll because the old
>> visitation order (with the small patch) is able to infer readnone on
>> test1_b (Chandler's patch has a FIXME with "AFTER2" as the desired
>> behavior).
>>
>
> Yes, I haven't added the devirtualization trigger that the old pass
> manager has. I can add it easily in a follow-up patch, but it is orthogonal
> to this one as it doesn't actually require call graph structural updates,
> it just requires detecting indirect -> direct call transformations.
>
> If its useful, I'm happy to prioritize implementing that and show you what
> it would look like in this model. It is pretty easy to do. It mostly
> requires a trivial recreation of an extra data structure that the old Call
> Graph provides (at great cost) that has a value handle for all the indirect
> calls so we can notice when they change.
>
>
>>
>> The test cases could be probably be modified so that the old visitation
>> logic (or small modifications thereof) would fail yet Chandler's proposed
>> logic would succeed. But the test cases seem to get increasingly
>> far-fetched.
>>
>
> I don't think the modifications are terribly far fetched. All it requires
> is for it to involve SCC refinement.
>

Can you please find and show some concrete examples from real programs?

>
>
>> Regardless, if/when the evidence appears for those cases being important
>> to optimize, we can add them; porting to the new PM is orthogonal.
>>
>> [*] Hopefully I will start an RFC and send out a proof of concept patch
>> tomorrow; I just need to test out a proof of concept port of the inliner on
>> something bigger to make sure everything works.
>>
>
> As I have said in this thread, several other threads when LazyCallGraph
> was first designed, and in several other discussions, I continue to think
> this is the wrong approach.
>

Can you provide links to those discussions? I can't find anything in my
email.

-- Sean Silva

>
> Not having an identity preserving call graph data structure will make the
> new pass manager design flawed. I think we will have to add hacks to get
> invalidation and preservation correct in that model, and will continue to
> observe the fact that we cannot cache analyses on the actual SCCs.
>
> While I understand wanting to pursue another option if we didn't have this
> patch which implements the more principled logic, given that this patch is
> out and I'm starting work on porting the inliner now, I'd somewhat like to
> continue down this path unless there are specific insurmountable problems
> that people see here.
>
>
>> -- Sean Silva
>>
>>
>>>
>>> If we are really keen in solving this problem, another way is to do a
>>> simple module pass to do the cleanup before the SCC is built. This will at
>>> least work for the examples in the test.
>>>
>>>
>>>
>>>> The examples in the test case are somewhat contrived in order to make
>>>> good test cases, but they hopefully represent the general pattern that I'm
>>>> worried about here: we are using the set of functions in the SCC to reason
>>>> about it as an SCC but not actually updating that set of functions when the
>>>> code changes.
>>>>
>>>
>>> I understand the motivation. What I want to know  is the
>>> complexity/benefit trade-offs here. Using refined attributes due to
>>> de-virtualization as example is not convincing IMO.
>>>
>>> thanks,
>>>
>>> David
>>>
>>>
>>>>
>>>> > The general idea is clearly to maintain a bottom-up visitation
>>>> behavior, but since this patch doesn't implement edge addition (which
>>>> represents devirtualization and so is pretty important) the intended
>>>> behavior there is unclear.
>>>>
>>>>
>>>> So, this *does* implement ref -> call promotion which is what I would
>>>> expect devirtualization to look like in all cases. See below for details on
>>>> your question.
>>>>
>>>> The other case is for "out-of-thin-air" call (or ref) insertion such as
>>>> instrumentation passes might do. Currently this isn't supported but could
>>>> be added in the future.
>>>>
>>>>
>>>> ================
>>>> Comment at: lib/Analysis/CGSCCPassManager.cpp:90
>>>> @@ +89,3 @@
>>>> +    for (Instruction &I : BB)
>>>> +      if (auto CS = CallSite(&I))
>>>> +        if (Function *Callee = CS.getCalledFunction())
>>>> ----------------
>>>> (in case reading this against the current patch, this originally was
>>>> attached to the FIXME regarding handling adding new calls)
>>>>
>>>> As i somewhat alluded to above, what you describe should be handled by
>>>> a ref edge turning into a call edge, and the update mechanism should be
>>>> able to handle that well.
>>>>
>>>> I've added a test case that exercises this with GVN and function-attrs.
>>>> There is currently a case missed in the new pass manager because we don't
>>>> have the up-to-four iteration whenever an indirect call turns into a direct
>>>> call heuristic that the old pass manager has. I'm happy to add that, but
>>>> I'd like to add it in a follow-up patch. I've marked where in the test case
>>>> this is missed, and I've demonstrated that in theory this update mechanism
>>>> is sufficient to handle it by explicitly running function-attrs again and
>>>> it correctly catches the refinement.
>>>>
>>>> The direct answer to your question is #2: it continues running on the
>>>> now-larger SCC, detects that we switch from one SCC to another at some
>>>> point, and re-runs on that SCC to make sure that the refined graph is
>>>> observed.
>>>>
>>>> ================
>>>> Comment at: lib/Analysis/CGSCCPassManager.cpp:170
>>>> @@ +169,3 @@
>>>> +
>>>> +    // The easy case is when the target RefSCC is not this RefSCC.
>>>> This is
>>>> +    // only supported when the target RefSCC is a child of this RefSCC.
>>>> ----------------
>>>> sanjoy wrote:
>>>> > Doesn't this also hold if the target SCC is not this SCC?
>>>> Yes, and inside the implementation of switchInternalEdgeToRef, it early
>>>> exits with an empty range when we hit that scenario.
>>>>
>>>> I can lift that distinction up into this code if you think that would
>>>> be helpful, the somewhat arbitrary split was that these methods are on a
>>>> RefSCC, and so it is the RefSCC-easy case that callers have to handle.
>>>>
>>>>
>>>> http://reviews.llvm.org/D21464
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160629/27e0bc95/attachment.html>