<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">2016-12-30 22:22 GMT+01:00 Daniel Berlin <span dir="ltr"><<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><br>

<br>

</span>I understand your argument that after optimizing to a fixed point, we should get the same result.  The problem is that the *order* of iteration will differ.  This can mean differences in naming, memory allocation patterns, or even output (I don't believe GVNPRE actually iterates to a fixed point.)<br></blockquote><div><br></div></span><div>GVN + PRE these days does iterate to a fixpoint, but it also gives up if it doesn't converge quickly.</div><span class=""><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span><br>

> I can continue looping and either try hard to find local dependency or 'the best' non-local dependency (which probably won't be found because GVN will already remove it)<br>

>  and return it, or even collect list of all non local dependencies.<br>

<br>

</span>The key point is that we should return a consistent result, regardless of which query path we see.  Returning the most dominating non-local would be one reasonable scheme.<br>

<span><br>

> I am also not sure if I understand what 'the best' non local dependency means.<br>

<br>

</span>Don't get too caught up on "best" here.  The ordering point above is the primary concern.  A secondary concern is reducing the number of iterations required to reach the fixed point.  Why do multiple full scans if we can bypass all but one with a small amount of extra work?  Particularly work that we know only happens when we have found a useful result and are just trying to find a better one?  (i.e. we're not burning time when we're not making progress.)<br></blockquote><div><br></div><div><br></div></span><div>+1 to this however. Truthfully, we can always make invariant group loads faster. We could, for example, link them together, etc.</div><div><br></div><div>This is not the thing that is hard to speed up :)</div><div><br></div><div><br></div></div></div></div></blockquote><div>That's true, but this is something I would prefer to do in MemSSA :)</div><div>AFAIK !invariant.group is only emited with -fstrict-vtable-pointers in clang, and I never heard any other frontend using it. Because -fstrict-vtable-pointers is still not default, then this work is more of an experiment. Before turning it to default there are a couple of other problems (like skipping barriers etc), that I would not like to implement in MemDep, and instead do it in MemSSE.</div><div>But it would be nice to say that we did milestone in devirtualization for clang-4.0. </div><div><br></div><div>Piotr</div><div><br></div><div>PS: consider sending replies in phabricator, because email replies sometimes doesn't appear there (like this one)</div><div> </div></div><br></div></div>