<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 8/4/2017 2:55 PM, Daniel Berlin
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Aug 4, 2017 at 11:41 AM, Chad
Rosier <span dir="ltr"><<a
href="mailto:mcrosier@codeaurora.org" target="_blank"
moz-do-not-send="true">mcrosier@codeaurora.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-">
<p><br>
</p>
<br>
<div
class="gmail-m_4340832051958399694moz-cite-prefix">On
8/4/2017 2:06 PM, Daniel Berlin wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">A few notes:<br>
I'm a bit surprised IPO copy/constant propagation
doesn't get this case, but i didn't look if the
lattice supports variables.
<div>In particular, in your example, given no
other call sites, it should eliminate the dead
code.</div>
<div>(In a real program, it may require cloning).</div>
</div>
</blockquote>
<br>
</span> In the actual program (SPEC2017/gcc,
ironically), there are multiple calls to fn2 and only
one of them has the property that the 1st and 2nd
argument are the same (as is shown in my pseudo code).
Internally, we have another developer, Matt Simpson,
working on a function specialization patch that might be
of value here. Specifically, you could clone fn2 based
on the fact that a_ptr == dst_ptr and then simplify a
great deal of the function. However, that patch is
still a WIP.</div>
</blockquote>
<div><br>
</div>
<div>FWIW: You almost certainly want to integrate that with
IPA based constant propagation, as it is the thing you
should be using to tell you what will happen in the call.
It should actually not be difficult at all (I can give you
references to papers, but it's just a couple hundred lines
of code on top of our current propagation engine)</div>
</div>
</div>
</div>
</blockquote>
<br>
If I'm not mistaken, this is exactly what Matt is doing. :D<br>
<blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>(It can also later be used to do type-base devirt).</div>
<div><br>
</div>
<div>GCC will do partial specialization (IE contextually
decide to clone callsites where it believes the
constantness/etc will cause elimination)</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>GCC will do IPA-CP/const-prop with cloning,
and i'm wildly curious if new GCC's catch this
case for you at higher optimization levels?</div>
</div>
</blockquote>
<br>
</span> GCC does inline fn2 into fn1 in this particular
case, but I'm not exactly sure how GCC accomplishes
this. I'm guessing GCC is just more aggressive with its
inlining (fn2 is also marked with the inline keyword,
which I assume GCC uses as a hint). I'm speculating
here and I've never worked on GCC, so unfortunately I
have little to go on.</div>
</blockquote>
<div><br>
</div>
<div>I meant if you turn off inlining :)</div>
</div>
</div>
</div>
</blockquote>
Doh. Right.<br>
<br>
<blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>I can take a gander though, given the info you've
given.</div>
<div><br>
</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>If so, it may be worth not looking at this as
an inlining problem, but as an area we need IPO
infrastructure improvement <br>
</div>
</div>
</blockquote>
<br>
</span> Because of the multiple callsites with varying
characteristics I'm not sure this can be solved in this
way.</div>
</blockquote>
<div><br>
</div>
<div>FWIW: It definitely can. Whether we want to, ....</div>
</div>
</div>
</div>
</blockquote>
Yes, you're right.<br>
<br>
<blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>That said, this is the whole purpose of IPA const/copy
prop.</div>
<div>LLVM stops at the "propagate things that are always
constant in every case" whereas, most compilers do "if
worth it, clone this function callsite where i can prove
it will be constant ".</div>
<div><br>
</div>
<div>You probably not want to rely on inlining for all
possible IPO effects. Especially in this case, where IPO
can actually do the job.</div>
</div>
</div>
</div>
</blockquote>
Yes, that makes sense.<br>
<br>
<blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>IMHO, if you can, you want to reserve inlining-as-IPO
for the cases where the IPO algorithms are
difficult/expensive/etc.</div>
<div><br>
</div>
<div>Or, at the very least, drive inlining like this by the
results of those IPO algorithms.</div>
</div>
</div>
</div>
</blockquote>
<br>
Sounds reasonable. I'll throw this idea around with Matt and Geoff.<br>
<br>
<blockquote type="cite"
cite="mid:CAF4BwTVm5LxrTE6k8-335afeij10x3Or6mjqwKsC21H=o4bKAQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div><br>
</div>
<div>Otherwise, a couple things:
<div>
<div>Approximate dominators (for example,
semi-dominators) can be computed fast (a
DFS walk of the CFG with no real
additional computation)</div>
<div>Except in strange CFGs that jump around
a lot, they are the dominators.</div>
<div><br>
</div>
<div>More importantly, the dominator is
either the sdom or a proper ancestor of
the sdom.</div>
<div><br>
</div>
<div>The practical impact of this is that if
you use them as if they were dominators,
the set of conditions you discover will
not be "too wrong". Occasionally wrong,
but mostly not.</div>
<div><br>
</div>
<div>My guess is the cost of doing
approximate dominators is ~50-60% of the
cost of doing dominators. Nowadays, about
half the time was in the DFS walk, the
other half in the computation. At best, it
would be 2-3x faster.</div>
<div>I've no idea if this changes whether
we'd want dominators, approximate
dominators, or stick with nothing.</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> Right, this is kinda one of the bigger questions
I'm trying to figure out. My proposed solution doesn't
use the dominator tree in order to minimize the impact
on compile-time. However, I'd guess the ROI is going to
be much smaller because of the limited scope. On the
other end of the spectrum I'd fear the full dominator
tree would be too computationally expensive (but of
course some of that could be mitigated by the ability to
do incremental updates to the dominator tree).<span
class="gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div><br>
</div>
<div>If you have some sort of dominatorish
tree could then just use earlycse's method
of dominating condition finding:</div>
<div>Process in "dom tree" top-down order,
push the equivalences you see, pop the
relevant ones when you exit the relevant
dom tree scope.</div>
<div><br>
</div>
<div>In practice, you'd only check
comparisons against the hash table.</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> Humm.. I'll have to think about it for a bit.
I'm thinking this might be a good compromise for my
needs.<span class="gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div><br>
</div>
<div>The other option is PredicateInfo, but
it requires dominators and modifies the
IR.</div>
<div><br>
</div>
<div>My guess is this is undesired/too
heavyweight for inline cost analysis,
however the basic principle on how it
renames things could also be applied
without IR changing for this specific
case. Unlike the EarlyCSE method, which
is O(all instructons) PredicateInfo is
O(branches + number of uses of variables
affected by conditions) Without going
into futher details, if all you care about
is "for each condition, give me the set of
possibly affected variables" (so you can
see if they may simplify), we could do
that very very quickly (as fast as we can
sort a vector). But it does require
dominators.</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> For my particular problem, I think PredicateInfo
would be sufficient IIUYC. But as you suggest, I'm
thinking people aren't going to be fond of using the
full dominators.<br>
<br>
Lots of great feedback. Thanks, Danny.
<div>
<div class="gmail-h5"><br>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>