[PATCH] D21464: [PM] WIP: Introduce basic update capabilities to the new PM's CGSCC pass manager, including both plumbing and logic to handle function pass updates.

Wed Jun 29 22:54:10 PDT 2016

On Wed, Jun 29, 2016 at 10:13 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:

> On Wed, Jun 29, 2016 at 9:37 PM Sean Silva <chisophugis at gmail.com> wrote:
>
>> On Wed, Jun 29, 2016 at 5:42 PM, Chandler Carruth <chandlerc at gmail.com>
>> wrote:
>>
>>> On Wed, Jun 29, 2016 at 1:16 AM Sean Silva via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>>> On Wed, Jun 29, 2016 at 12:03 AM, Xinliang David Li <davidxl at google.com
>>>> > wrote:
>>>>
>>>>> On Tue, Jun 28, 2016 at 5:41 PM, Chandler Carruth <chandlerc at gmail.com
>>>>> > wrote:
>>>>>
>>>>>> As an example of the kind of missed optimizations, I'm going to add a
>>>>>> test case where we successfully deduce readnone with the new pass manager
>>>>>> but can't (no matter how many iterations we make) deduce it with the old
>>>>>> one. This is because we don't refine the SCC graph structure, and do
>>>>>> principled visitation to that graph structure.
>>>>>>
>>>>>
>>>>> I saw the test case. The question is there anything preventing SCC
>>>>> graph be refined for the second iteration in the old framework?
>>>>>
>>>>
>>>> I tested this with my proof of concept port of the old visit order to
>>>> the new pass manager [*]. With the following tiny modification of
>>>> RefreshCallGraph, the port passes both of Chandler's tests:
>>>> http://reviews.llvm.org/P6012
>>>>
>>>
>>> As you mentioned, I had over-simplified my test cases, and they weren't
>>> testing what I was describing in email, sorry about that. They no longer
>>> required a *refined* SCC graph.
>>>
>>> I've updated the test cases to actually check the refined SCC graph, and
>>> even with the patch you mentioned, the old pass manager can't figure this
>>> out.
>>>
>>> Of course, there is a way to get this with the old pass manager. You
>>> have to re-build the call graph. But that requires a full re-run of the
>>> bottom-up walk. You can see this by doing 'opt -functionattrs -simplifycfg
>>> -functionattrs' which will get this case, but will build an entirely new
>>> call graph to do so.
>>>
>>>
>>> What I'm trying to point out is a fundamental limitation that is
>>> addressed by doing on-line update and refinement of the graph without
>>> re-starting the entire pipeline. Supporting that + identity based caching
>>> for the new pass manager were the two driving motivations for the design of
>>> the LazyCallGraph and I continue to think having this functionality is
>>> really essential.
>>>
>>>
>>>> In fact, I even had to modify cgscc-observe-devirt.ll because the old
>>>> visitation order (with the small patch) is able to infer readnone on
>>>> test1_b (Chandler's patch has a FIXME with "AFTER2" as the desired
>>>> behavior).
>>>>
>>>
>>> Yes, I haven't added the devirtualization trigger that the old pass
>>> manager has. I can add it easily in a follow-up patch, but it is orthogonal
>>> to this one as it doesn't actually require call graph structural updates,
>>> it just requires detecting indirect -> direct call transformations.
>>>
>>> If its useful, I'm happy to prioritize implementing that and show you
>>> what it would look like in this model. It is pretty easy to do. It mostly
>>> requires a trivial recreation of an extra data structure that the old Call
>>> Graph provides (at great cost) that has a value handle for all the indirect
>>> calls so we can notice when they change.
>>>
>>>
>>>>
>>>> The test cases could be probably be modified so that the old visitation
>>>> logic (or small modifications thereof) would fail yet Chandler's proposed
>>>> logic would succeed. But the test cases seem to get increasingly
>>>> far-fetched.
>>>>
>>>
>>> I don't think the modifications are terribly far fetched. All it
>>> requires is for it to involve SCC refinement.
>>>
>>
>> Can you please find and show some concrete examples from real programs?
>>
>
> I'm not sure what you're looking for here... An example of an SCC in real
> code that gets refined by scalar passes? I have seen examples of indirect
> recursion which happens behind a predicate that might be constant folded. I
> don't have a list of them though, and I'm not sure it is a really good use
> of time to try to find them and enumerate them.
>
> If you mean (as I suspect you do) an example where failure to refine the
> SCC model in turn proves a limitation for optimization, that I don't have a
> concrete example of, and it would be extremely hard to find an example:
> - The best way to identify this is to have the infrastructure available to
> test for it.
> - It also cannot account for future SCC-based optimizations that we might
> want to add.
>
> One of my goals with establishing a more principled design for this is to
> lay a powerful foundation of infrastructure. Not every aspect of that is
> directly and immediately relevant, but without it I strongly fear we will
> end up precisely where we are today with the pass manager. It was years
> before we had examples from real programs that really showcased the
> limitations it suffered from.
>

>
>
> I would still really love to talk about any concerns or problems you see
> with the design. Currently the one I'm thinking about the most is quadratic
> behaviors, but if there are other concerns I'd like to discuss them.
>

I don't in principle have anything against using this more complicated
logic.
While I think the burden should be on you to show that this complexity is
needed (we're talking about 2600 lines for LCG + another 500+ for this
rewriting logic; this is nontrivial), for the sake of making forward
progress I think it is fine to move forward with this (i.e. I'll start to
make non-high-level comments).
My suspicion is that we'll run this on some real world codebases and find
that most of this code rarely (if ever) gives any benefit and be justified
in ripping it out and replacing it with a simpler strategy, but that can be
determined later. It would just be so much nicer if you started with
something simple and incrementally add in this complexity with performance
numbers at each step, but at this point it is water under the bridge.

(the internet was out today at work and other stuff came up so I didn't
finish with the inliner. I'm wrapping up for the day right now but all I
have left to do is to get rid of the AssumptionCacheTracker passed through
llvm::getInlineCost and replacing it with a callback or whatever. I'll post
the patches, at least for reference, tomorrow.)

One thing I'm a bit worried about is that it seems like this scheme is
going to require incredibly invasive surgery of the existing inliner. Is
the plan there to do a from-scratch rewrite there too? I'm extremely wary
of introducing divergence. At the very least, keeping a pile of real logic
code essentially duplicated for an extended period of time in tree has a
bad smell. I saw that PostOrderFunctionAttrs had uncomfortably much
duplicated stuff at the top level; I'd really like to avoid that for the
inliner.

-- Sean Silva

>
>
>>
>>
>>>
>>>
>>>> Regardless, if/when the evidence appears for those cases being
>>>> important to optimize, we can add them; porting to the new PM is orthogonal.
>>>>
>>>> [*] Hopefully I will start an RFC and send out a proof of concept patch
>>>> tomorrow; I just need to test out a proof of concept port of the inliner on
>>>> something bigger to make sure everything works.
>>>>
>>>
>>> As I have said in this thread, several other threads when LazyCallGraph
>>> was first designed, and in several other discussions, I continue to think
>>> this is the wrong approach.
>>>
>>
>> Can you provide links to those discussions? I can't find anything in my
>> email.
>>
>
> I have tried very hard to find them, and haven't succeeded. Really sorry
> about this. I know some of them took place outside of email sadly, but my
> memory is that some did take place in the discussion threads around the
> pass manager and the need for having a cache key for SCCs.
>
> That said, my only real goal in mentioning this is that I don't want you
> or others to think that this is something that was never discussed. I even
> clearly remember Hal having reservations about this. ;] I know I myself
> would have preferred to maintain precise similarity to the current pass
> manager, but I don't see a way that results in a robust interaction with
> cache based management of analyses.
>
> Anyways, I'm happy to try to explain this freshly if useful, but I'm not
> sure what the remaining points of contention or confusion are.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160629/024675a1/attachment.html>