<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Wed, Jun 29, 2016 at 1:16 AM Sean Silva via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Jun 29, 2016 at 12:03 AM, Xinliang David Li <span dir="ltr"><<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Tue, Jun 28, 2016 at 5:41 PM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span style="line-height:1.5">As an example of the kind of missed optimizations, I'm going to add a test case where we successfully deduce readnone with the new pass manager but can't (no matter how many iterations we make) deduce it with the old one. This is because we don't refine the SCC graph structure, and do principled visitation to that graph structure.</span><br></blockquote></span><span><div><br></div></span><div>I saw the test case. The question is there anything preventing SCC graph be refined for the second iteration in the old framework?</div></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I tested this with my proof of concept port of the old visit order to the new pass manager [*]. With the following tiny modification of RefreshCallGraph, the port passes both of Chandler's tests: <a href="http://reviews.llvm.org/P6012" target="_blank">http://reviews.llvm.org/P6012</a></div></div></div></div></blockquote><div><br></div><div>As you mentioned, I had over-simplified my test cases, and they weren't testing what I was describing in email, sorry about that. They no longer required a *refined* SCC graph.</div><div><br></div><div>I've updated the test cases to actually check the refined SCC graph, and even with the patch you mentioned, the old pass manager can't figure this out.</div><div><br></div><div>Of course, there is a way to get this with the old pass manager. You have to re-build the call graph. But that requires a full re-run of the bottom-up walk. You can see this by doing 'opt -functionattrs -simplifycfg -functionattrs' which will get this case, but will build an entirely new call graph to do so.</div><div><span style="line-height:1.5"><br></span></div><div><span style="line-height:1.5"><br></span></div><div><span style="line-height:1.5">What I'm trying to point out is a fundamental limitation that is addressed by doing on-line update and refinement of the graph without re-starting the entire pipeline. Supporting that + identity based caching for the new pass manager were the two driving motivations for the design of the LazyCallGraph and I continue to think having this functionality is really essential.</span><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>In fact, I even had to modify cgscc-observe-devirt.ll because the old visitation order (with the small patch) is able to infer readnone on test1_b (Chandler's patch has a FIXME with "AFTER2" as the desired behavior).</div></div></div></div></blockquote><div><br></div><div>Yes, I haven't added the devirtualization trigger that the old pass manager has. I can add it easily in a follow-up patch, but it is orthogonal to this one as it doesn't actually require call graph structural updates, it just requires detecting indirect -> direct call transformations.</div><div><br></div><div>If its useful, I'm happy to prioritize implementing that and show you what it would look like in this model. It is pretty easy to do. It mostly requires a trivial recreation of an extra data structure that the old Call Graph provides (at great cost) that has a value handle for all the indirect calls so we can notice when they change.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>The test cases could be probably be modified so that the old visitation logic (or small modifications thereof) would fail yet Chandler's proposed logic would succeed. But the test cases seem to get increasingly far-fetched.</div></div></div></div></blockquote><div><br></div><div>I don't think the modifications are terribly far fetched. All it requires is for it to involve SCC refinement.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> Regardless, if/when the evidence appears for those cases being important to optimize, we can add them; porting to the new PM is orthogonal.</div><div><br></div><div>[*] Hopefully I will start an RFC and send out a proof of concept patch tomorrow; I just need to test out a proof of concept port of the inliner on something bigger to make sure everything works.<br></div></div></div></div></blockquote><div><br></div><div>As I have said in this thread, several other threads when LazyCallGraph was first designed, and in several other discussions, I continue to think this is the wrong approach.</div><div><br></div><div>Not having an identity preserving call graph data structure will make the new pass manager design flawed. I think we will have to add hacks to get invalidation and preservation correct in that model, and will continue to observe the fact that we cannot cache analyses on the actual SCCs.</div><div><br></div><div>While I understand wanting to pursue another option if we didn't have this patch which implements the more principled logic, given that this patch is out and I'm starting work on porting the inliner now, I'd somewhat like to continue down this path unless there are specific insurmountable problems that people see here.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><br></div><div>-- Sean Silva</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>If we are really keen in solving this problem, another way is to do a simple module pass to do the cleanup before the SCC is built. This will at least work for the examples in the test.</div><span><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> The examples in the test case are somewhat contrived in order to make good test cases, but they hopefully represent the general pattern that I'm worried about here: we are using the set of functions in the SCC to reason about it as an SCC but not actually updating that set of functions when the code changes.<br></blockquote><div><br></div></span><div>I understand the motivation. What I want to know  is the complexity/benefit trade-offs here. Using refined attributes due to de-virtualization as example is not convincing IMO. </div><div><br></div><div>thanks,</div><div><br></div><div>David</div><div><div class="m_947967621396955811h5"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<br>

> The general idea is clearly to maintain a bottom-up visitation behavior, but since this patch doesn't implement edge addition (which represents devirtualization and so is pretty important) the intended behavior there is unclear.<br>

<br>

<br>

So, this *does* implement ref -> call promotion which is what I would expect devirtualization to look like in all cases. See below for details on your question.<br>

<br>

The other case is for "out-of-thin-air" call (or ref) insertion such as instrumentation passes might do. Currently this isn't supported but could be added in the future.<br>

<br>

<br>

================<br>

Comment at: lib/Analysis/CGSCCPassManager.cpp:90<br>

@@ +89,3 @@<br>

+    for (Instruction &I : BB)<br>

+      if (auto CS = CallSite(&I))<br>

+        if (Function *Callee = CS.getCalledFunction())<br>

----------------<br>

(in case reading this against the current patch, this originally was attached to the FIXME regarding handling adding new calls)<br>

<br>

As i somewhat alluded to above, what you describe should be handled by a ref edge turning into a call edge, and the update mechanism should be able to handle that well.<br>

<br>

I've added a test case that exercises this with GVN and function-attrs. There is currently a case missed in the new pass manager because we don't have the up-to-four iteration whenever an indirect call turns into a direct call heuristic that the old pass manager has. I'm happy to add that, but I'd like to add it in a follow-up patch. I've marked where in the test case this is missed, and I've demonstrated that in theory this update mechanism is sufficient to handle it by explicitly running function-attrs again and it correctly catches the refinement.<br>

<br>

The direct answer to your question is #2: it continues running on the now-larger SCC, detects that we switch from one SCC to another at some point, and re-runs on that SCC to make sure that the refined graph is observed.<br>

<span><br>

================<br>

Comment at: lib/Analysis/CGSCCPassManager.cpp:170<br>

@@ +169,3 @@<br>

+<br>

+    // The easy case is when the target RefSCC is not this RefSCC. This is<br>

+    // only supported when the target RefSCC is a child of this RefSCC.<br>

----------------<br>

</span><span>sanjoy wrote:<br>

> Doesn't this also hold if the target SCC is not this SCC?<br>

</span>Yes, and inside the implementation of switchInternalEdgeToRef, it early exits with an empty range when we hit that scenario.<br>

<br>

I can lift that distinction up into this code if you think that would be helpful, the somewhat arbitrary split was that these methods are on a RefSCC, and so it is the RefSCC-easy case that callers have to handle.<br>

<br>

<br>

<a href="http://reviews.llvm.org/D21464" rel="noreferrer" target="_blank">http://reviews.llvm.org/D21464</a><br>

<br>

<br>

<br>

</blockquote></div></div></div><br></div></div>

</blockquote></div></div></div>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>

</blockquote></div></div>