[PATCH] D21464: [PM] WIP: Introduce basic update capabilities to the new PM's CGSCC pass manager, including both plumbing and logic to handle function pass updates.

Wed Jun 29 22:13:30 PDT 2016

On Wed, Jun 29, 2016 at 9:37 PM Sean Silva <chisophugis at gmail.com> wrote:

> On Wed, Jun 29, 2016 at 5:42 PM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
>
>> On Wed, Jun 29, 2016 at 1:16 AM Sean Silva via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> On Wed, Jun 29, 2016 at 12:03 AM, Xinliang David Li <davidxl at google.com>
>>> wrote:
>>>
>>>> On Tue, Jun 28, 2016 at 5:41 PM, Chandler Carruth <chandlerc at gmail.com>
>>>> wrote:
>>>>
>>>>> As an example of the kind of missed optimizations, I'm going to add a
>>>>> test case where we successfully deduce readnone with the new pass manager
>>>>> but can't (no matter how many iterations we make) deduce it with the old
>>>>> one. This is because we don't refine the SCC graph structure, and do
>>>>> principled visitation to that graph structure.
>>>>>
>>>>
>>>> I saw the test case. The question is there anything preventing SCC
>>>> graph be refined for the second iteration in the old framework?
>>>>
>>>
>>> I tested this with my proof of concept port of the old visit order to
>>> the new pass manager [*]. With the following tiny modification of
>>> RefreshCallGraph, the port passes both of Chandler's tests:
>>> http://reviews.llvm.org/P6012
>>>
>>
>> As you mentioned, I had over-simplified my test cases, and they weren't
>> testing what I was describing in email, sorry about that. They no longer
>> required a *refined* SCC graph.
>>
>> I've updated the test cases to actually check the refined SCC graph, and
>> even with the patch you mentioned, the old pass manager can't figure this
>> out.
>>
>> Of course, there is a way to get this with the old pass manager. You have
>> to re-build the call graph. But that requires a full re-run of the
>> bottom-up walk. You can see this by doing 'opt -functionattrs -simplifycfg
>> -functionattrs' which will get this case, but will build an entirely new
>> call graph to do so.
>>
>>
>> What I'm trying to point out is a fundamental limitation that is
>> addressed by doing on-line update and refinement of the graph without
>> re-starting the entire pipeline. Supporting that + identity based caching
>> for the new pass manager were the two driving motivations for the design of
>> the LazyCallGraph and I continue to think having this functionality is
>> really essential.
>>
>>
>>> In fact, I even had to modify cgscc-observe-devirt.ll because the old
>>> visitation order (with the small patch) is able to infer readnone on
>>> test1_b (Chandler's patch has a FIXME with "AFTER2" as the desired
>>> behavior).
>>>
>>
>> Yes, I haven't added the devirtualization trigger that the old pass
>> manager has. I can add it easily in a follow-up patch, but it is orthogonal
>> to this one as it doesn't actually require call graph structural updates,
>> it just requires detecting indirect -> direct call transformations.
>>
>> If its useful, I'm happy to prioritize implementing that and show you
>> what it would look like in this model. It is pretty easy to do. It mostly
>> requires a trivial recreation of an extra data structure that the old Call
>> Graph provides (at great cost) that has a value handle for all the indirect
>> calls so we can notice when they change.
>>
>>
>>>
>>> The test cases could be probably be modified so that the old visitation
>>> logic (or small modifications thereof) would fail yet Chandler's proposed
>>> logic would succeed. But the test cases seem to get increasingly
>>> far-fetched.
>>>
>>
>> I don't think the modifications are terribly far fetched. All it requires
>> is for it to involve SCC refinement.
>>
>
> Can you please find and show some concrete examples from real programs?
>

I'm not sure what you're looking for here... An example of an SCC in real
code that gets refined by scalar passes? I have seen examples of indirect
recursion which happens behind a predicate that might be constant folded. I
don't have a list of them though, and I'm not sure it is a really good use
of time to try to find them and enumerate them.

If you mean (as I suspect you do) an example where failure to refine the
SCC model in turn proves a limitation for optimization, that I don't have a
concrete example of, and it would be extremely hard to find an example:
- The best way to identify this is to have the infrastructure available to
test for it.
- It also cannot account for future SCC-based optimizations that we might
want to add.

One of my goals with establishing a more principled design for this is to
lay a powerful foundation of infrastructure. Not every aspect of that is
directly and immediately relevant, but without it I strongly fear we will
end up precisely where we are today with the pass manager. It was years
before we had examples from real programs that really showcased the
limitations it suffered from.

I would still really love to talk about any concerns or problems you see
with the design. Currently the one I'm thinking about the most is quadratic
behaviors, but if there are other concerns I'd like to discuss them.

>
>
>>
>>
>>> Regardless, if/when the evidence appears for those cases being important
>>> to optimize, we can add them; porting to the new PM is orthogonal.
>>>
>>> [*] Hopefully I will start an RFC and send out a proof of concept patch
>>> tomorrow; I just need to test out a proof of concept port of the inliner on
>>> something bigger to make sure everything works.
>>>
>>
>> As I have said in this thread, several other threads when LazyCallGraph
>> was first designed, and in several other discussions, I continue to think
>> this is the wrong approach.
>>
>
> Can you provide links to those discussions? I can't find anything in my
> email.
>

I have tried very hard to find them, and haven't succeeded. Really sorry
about this. I know some of them took place outside of email sadly, but my
memory is that some did take place in the discussion threads around the
pass manager and the need for having a cache key for SCCs.

That said, my only real goal in mentioning this is that I don't want you or
others to think that this is something that was never discussed. I even
clearly remember Hal having reservations about this. ;] I know I myself
would have preferred to maintain precise similarity to the current pass
manager, but I don't see a way that results in a robust interaction with
cache based management of analyses.

Anyways, I'm happy to try to explain this freshly if useful, but I'm not
sure what the remaining points of contention or confusion are.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160630/d366ea4f/attachment.html>