<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
On 8/7/2017 1:02 PM, Daniel Berlin wrote:<br>
<blockquote type="cite"
cite="mid:CAF4BwTU732c6J_dEVJHGhEYQHaQgCkWf+hcOijJmdu_qt2VoZg@mail.gmail.com">
<div dir="ltr">Can someone fill me in on the issue with the
dominator tree, precisely, during inlining?
<div>We now have the capability of quickly keeping it up to date
without too much trouble (it may require pushing it through a
bunch of places, but the actual changes to do should be easy).</div>
</div>
</blockquote>
<br>
If I'm not mistaken (which I very well could be since I'm no expert
of the pass managers, but) I believe we need to use the new pass
manager to allow CGSCC passes to use cached Function analyses (i.e.,
domtree). IIUC, this is what Sean was trying to point out.
Assuming that's true, we should be able to use the new functionality
to preserve the domtree until we get to the inliner.<br>
<br>
I'm more then happy to work towards this approach, if the domtree
would be useful in the inliner..<br>
<br>
<blockquote type="cite"
cite="mid:CAF4BwTU732c6J_dEVJHGhEYQHaQgCkWf+hcOijJmdu_qt2VoZg@mail.gmail.com">
<div dir="ltr">
<div>Was the original issue cost of rebuilding it repeatedly
during inlining, or what?<br>
</div>
</div>
</blockquote>
<br>
Yes, this was my primary concern.<br>
<br>
<blockquote type="cite"
cite="mid:CAF4BwTU732c6J_dEVJHGhEYQHaQgCkWf+hcOijJmdu_qt2VoZg@mail.gmail.com">
<div dir="ltr">
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Aug 7, 2017 at 7:56 AM, Sander
De Smalen via llvm-dev <span dir="ltr"><<a
href="mailto:llvm-dev@lists.llvm.org" target="_blank"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
<div class="m_-8006846865748462121WordSection1">
<p class="MsoNormal"><span>Hi,</span></p>
<p class="MsoNormal"><span> </span></p>
<p class="MsoNormal"><span>Coincidentally I've been
working to optimize this same case last week. I was
struggling a bit to determine where to put this
functionality and eventually went for the pragmatic
approach of creating an experimental pass. Probably
not the eventual solution, but it may provide some
useful input to the discussion here.</span></p>
<p class="MsoNormal"><span> </span></p>
<p class="MsoNormal"><span>Basically, I experimented
with a 'pre-inlining-transform' pass that clones the
callsite if it can prove that one of the arguments
to the call can be replaced by a constant, based on
dominating conditions, e.g.:</span></p>
<p class="MsoNormal"><span> </span></p>
<p class="MsoNormal"><span
style="font-family:"Andale
Mono",sans-serif"> if (!ptr || ptr
&& ptr->val)</span></p>
<p class="MsoNormal"><span
style="font-family:"Andale
Mono",sans-serif"> foo(ptr, ...)</span></p>
<p class="MsoNormal"><span
style="font-family:"Andale
Mono",sans-serif"> </span></p>
<p class="MsoNormal"><span
style="font-family:"Andale
Mono",sans-serif"> =></span></p>
<p class="MsoNormal"><span
style="font-family:"Andale
Mono",sans-serif"> </span></p>
<p class="MsoNormal"><span
style="font-family:"Andale
Mono",sans-serif"> if (!ptr)</span></p>
<p class="MsoNormal"><span
style="font-family:"Andale
Mono",sans-serif"> foo(nullptr, ...)</span></p>
<p class="MsoNormal"><span
style="font-family:"Andale
Mono",sans-serif"> else if (ptr &&
ptr->val)</span></p>
<p class="MsoNormal"><span
style="font-family:"Andale
Mono",sans-serif"> foo(ptr
/*knownNonNull*/, ...)</span></p>
<p class="MsoNormal"><span> </span></p>
<p class="MsoNormal"><span>Here the first argument
becomes constant for the first callsite and a
further analysis pass sets the KnownNonNull
attribute on the first argument in the second
callsite. The InlinerCost algorithm can then
determine it is cheap enough to inline both cases,
because it knows the callee (foo) distinguishes
between the two cases. In the callee, the check for
'(ptrA == ptrB)' in function foo becomes constant
for the first callsite because it knows ptrA is
nullptr.</span></p>
<p class="MsoNormal"><span> </span></p>
<p class="MsoNormal"><span>To keep compile-time down, it
doesn’t use the dominator tree as it seemed
sufficient to look at the direct predecessors of the
block containing the call (and their
single-predecessors, for 'and'ed conditions). To
keep the cost of duplicating code down, it only
clones the block upto the call if the number of
instructions stays below some threshold, which in
practice reduces the cases to simple callsites with
directly dominating conditions. The reasoning behind
this is that duplicating the callsite is probably
‘cheap’ if it can eliminate an argument with a
constant, since this is often a pointer and is
likely to be checked for ‘nullptr’ in the callee
which will improve the inline cost.</span></p>
<p class="MsoNormal"><span> </span></p>
<p class="MsoNormal"><span>We’re still collecting
numbers to check there are no regressions from
cloning the callsite and the pass is still a bit
work in progress, but I should be able to share this
work if anyone is interested (if only just for
reference).</span></p>
<p class="MsoNormal"><span> </span></p>
<p class="MsoNormal"><span>Sander</span></p>
<p class="MsoNormal"><span> </span></p>
<div style="border:none;border-top:solid #b5c4df
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span
style="font-size:12.0pt;color:black">From: </span></b><span
style="font-size:12.0pt;color:black">llvm-dev <<a
href="mailto:llvm-dev-bounces@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev-bounces@lists.llvm.<wbr>org</a>>
on behalf of Sean Silva via llvm-dev <<a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>><br>
<b>Reply-To: </b>Sean Silva <<a
href="mailto:chisophugis@gmail.com"
target="_blank" moz-do-not-send="true">chisophugis@gmail.com</a>><br>
<b>Date: </b>Friday, 4 August 2017 at 23:07<span
class=""><br>
<b>To: </b>Chad Rosier <<a
href="mailto:mcrosier@codeaurora.org"
target="_blank" moz-do-not-send="true">mcrosier@codeaurora.org</a>><br>
<b>Cc: </b>llvm-dev <<a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>><br>
<b>Subject: </b>Re: [llvm-dev]
[RFC][InlineCost] Modeling JumpThreading (or
similar) in inline cost model</span></span></p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal"> </p>
<div>
<p class="MsoNormal"> </p>
<div>
<p class="MsoNormal">On Fri, Aug 4, 2017 at 11:41
AM, Chad Rosier via llvm-dev <<a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
wrote:</p>
<div>
<div class="h5">
<blockquote
style="border:none;border-left:solid #cccccc
1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<p> </p>
<p class="MsoNormal"> </p>
<div>
<p class="MsoNormal">On 8/4/2017 2:06
PM, Daniel Berlin wrote:</p>
</div>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">A few notes:<br>
I'm a bit surprised IPO
copy/constant propagation doesn't
get this case, but i didn't look if
the lattice supports variables.
</p>
<div>
<p class="MsoNormal">In particular,
in your example, given no other
call sites, it should eliminate
the dead code.</p>
</div>
<div>
<p class="MsoNormal">(In a real
program, it may require cloning).</p>
</div>
</div>
</blockquote>
<p class="MsoNormal"><br>
In the actual program (SPEC2017/gcc,
ironically), there are multiple calls to
fn2 and only one of them has the
property that the 1st and 2nd argument
are the same (as is shown in my pseudo
code). Internally, we have another
developer, Matt Simpson, working on a
function specialization patch that might
be of value here. Specifically, you
could clone fn2 based on the fact that
a_ptr == dst_ptr and then simplify a
great deal of the function. However,
that patch is still a WIP.<br>
<br>
<br>
</p>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<p class="MsoNormal">GCC will do
IPA-CP/const-prop with cloning,
and i'm wildly curious if new
GCC's catch this case for you at
higher optimization levels?</p>
</div>
</div>
</blockquote>
<p class="MsoNormal"><br>
GCC does inline fn2 into fn1 in this
particular case, but I'm not exactly
sure how GCC accomplishes this. I'm
guessing GCC is just more aggressive
with its inlining (fn2 is also marked
with the inline keyword, which I assume
GCC uses as a hint). I'm speculating
here and I've never worked on GCC, so
unfortunately I have little to go on.<br>
<br>
<br>
</p>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<p class="MsoNormal">If so, it may
be worth not looking at this as an
inlining problem, but as an area
we need IPO infrastructure
improvement
</p>
</div>
</div>
</blockquote>
<p class="MsoNormal"><br>
Because of the multiple callsites with
varying characteristics I'm not sure
this can be solved in this way.<br>
<br>
<br>
</p>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Otherwise, a
couple things: </p>
<div>
<div>
<p class="MsoNormal">Approximate
dominators (for example,
semi-dominators) can be
computed fast (a DFS walk of
the CFG with no real
additional computation)</p>
</div>
<div>
<p class="MsoNormal">Except in
strange CFGs that jump
around a lot, they are the
dominators.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">More
importantly, the dominator
is either the sdom or a
proper ancestor of the sdom.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">The
practical impact of this is
that if you use them as if
they were dominators, the
set of conditions you
discover will not be "too
wrong". Occasionally wrong,
but mostly not.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">My guess
is the cost of doing
approximate dominators is
~50-60% of the cost of doing
dominators. Nowadays, about
half the time was in the DFS
walk, the other half in the
computation. At best, it
would be 2-3x faster.</p>
</div>
<div>
<p class="MsoNormal">I've no
idea if this changes whether
we'd want dominators,
approximate dominators, or
stick with nothing.</p>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p class="MsoNormal"><br>
Right, this is kinda one of the bigger
questions I'm trying to figure out. My
proposed solution doesn't use the
dominator tree in order to minimize the
impact on compile-time. </p>
</div>
</blockquote>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">In the new PM, it's
possible for a CGSCC pass to get a cached
function analysis from functions that have
been visited previously. This requires the
CGSCC iteration on SCC's we visited
earlier to keep an up to date domtree, and
I don't know if that's the case (or how
much work it would be to make it the
case).</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">-- Sean Silva</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<blockquote
style="border:none;border-left:solid #cccccc
1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<p class="MsoNormal">However, I'd guess
the ROI is going to be much smaller
because of the limited scope. On the
other end of the spectrum I'd fear the
full dominator tree would be too
computationally expensive (but of course
some of that could be mitigated by the
ability to do incremental updates to the
dominator tree).<br>
<br>
<br>
</p>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">If you
have some sort of
dominatorish tree could then
just use earlycse's method
of dominating condition
finding:</p>
</div>
<div>
<p class="MsoNormal">Process
in "dom tree" top-down
order, push the equivalences
you see, pop the relevant
ones when you exit the
relevant dom tree scope.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">In
practice, you'd only check
comparisons against the hash
table.</p>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p class="MsoNormal"><br>
Humm.. I'll have to think about it for a
bit. I'm thinking this might be a good
compromise for my needs.<br>
<br>
<br>
</p>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">The other
option is PredicateInfo, but
it requires dominators and
modifies the IR.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">My guess
is this is undesired/too
heavyweight for inline cost
analysis, however the basic
principle on how it renames
things could also be applied
without IR changing for this
specific case. Unlike the
EarlyCSE method, which is
O(all instructons)
PredicateInfo is O(branches
+ number of uses of
variables affected by
conditions) Without going
into futher details, if all
you care about is "for each
condition, give me the set
of possibly affected
variables" (so you can see
if they may simplify), we
could do that very very
quickly (as fast as we can
sort a vector). But it does
require dominators.</p>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p class="MsoNormal"><br>
For my particular problem, I think
PredicateInfo would be sufficient
IIUYC. But as you suggest, I'm thinking
people aren't going to be fond of using
the full dominators.<br>
<br>
Lots of great feedback. Thanks, Danny.
</p>
<div>
<div>
<p class="MsoNormal"><br>
<br>
<br>
</p>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
</div>
</div>
</div>
</div>
<div>
<p class="MsoNormal"> </p>
<div>
<p class="MsoNormal">On Fri, Aug
4, 2017 at 8:56 AM, Chad
Rosier <<a
href="mailto:mcrosier@codeaurora.org"
target="_blank"
moz-do-not-send="true">mcrosier@codeaurora.org</a>>
wrote:</p>
<blockquote
style="border:none;border-left:solid
#cccccc 1.0pt;padding:0in 0in
0in
6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<p><tt><span
style="font-size:10.0pt">All,</span></tt></p>
<p><tt><span
style="font-size:10.0pt">I'm
working on an
improvement to the
inline cost model, but
I'm unsure how to
proceed. Let me begin
by first describing
the problem I'm trying
to solve. Consider
the following pseudo C
code:</span></tt></p>
<p><b><span
style="font-family:"Courier
New",serif">typedef
struct element {<br>
unsigned idx;<br>
} element_t;</span></b></p>
<p><b><span
style="font-family:"Courier
New",serif">static
inline<br>
unsigned char fn2
(element_t *dst_ptr,
const element_t
*a_ptr,<br>
const element_t
*b_ptr, unsigned char
changed) {<br>
if (a_ptr &&
b_ptr &&
a_ptr->idx ==
b_ptr->idx) {<br>
if (!changed
&& dst_ptr
&&
dst_ptr->idx ==
a_ptr->idx) {<br>
/* Do something
*/<br>
} else {<br>
changed = 1;<br>
if (!dst_ptr)<br>
dst_ptr =
fn3();<br>
else<br>
dst_ptr->idx =
a_ptr->idx;<br>
/* Do something.
*/<br>
}<br>
} else {<br>
changed = fn4();<br>
}<br>
return changed;<br>
}<br>
<br>
unsigned char fn1
(element_t *a_ptr,
element_t *b_ptr) {<br>
unsigned char
changed = 0;<br>
while (b_ptr) {<br>
if (!a_ptr ||
a_ptr->idx ==
b_ptr->idx) {<br>
changed = fn2
(a_ptr, a_ptr, b_ptr,
changed);<br>
b_ptr =
b_ptr->next;<br>
}<br>
}<br>
return changed;<br>
}</span></b></p>
<p><tt><span
style="font-size:10.0pt">When
the inline cost model
computes the inline
cost of fn2 it ends up
being much higher than
the inline threshold.
A fair amount of the
cost is due to the
inlining of fn3 into
fn2. However, if fn2
had been inlined into
fn1 the code from fn3
would have been
removed as
dead/unreachable.</span></tt></p>
<p><tt><span
style="font-size:10.0pt">At
the fn2 call site
notice the first two
arguments are equal.
Thus, in the context
of fn2 dst_ptr and
a_ptr are equal. The
call site of fn3 is
predicated on dst_ptr
being null (i.e., if
(!dst_ptr) dst_ptr =
fn3()), but that code
is predicated on a_ptr
being non-null.
Therefore, we know the
condition !dst_ptr is
false (because a_ptr
== dst_ptr and a_ptr
is non-null) and the
call to fn3 is dead.
I suspect one of
JumpThreading,
EarlyCSE, or GVN does
the elimination after
inlining, so that's
what I'd like to try
and model in the
inline cost model.
(Note fn2 has multiple
call sides and the
property that the
first and second
arguments are equal
isn't true for each
call site, so
something like IPSCCP
doesn't actually help,
AFAICT).</span></tt></p>
<p><tt><span
style="font-size:10.0pt">My
first attempt at
solving this problem
did something similar
to what is done in
JumpThreadingPass::<wbr>ProcessImpliedCondition().
Specifically, it tried
to prove that dst_ptr
was non-null based on
a dominating
condition. The only
tricky parts were to
deal with
hammocks/diamonds when
walking up the CFG
(See:
<a
href="https://reviews.llvm.org/D36287"
target="_blank"
moz-do-not-send="true">https://reviews.llvm.org/<wbr>D36287</a>
as a concrete example
of how I proposed to
get an immediate
dominator without the
domtree) and to
account for the fact
that dst_ptr and a_ptr
are equal.</span></tt></p>
<p><tt><span
style="font-size:10.0pt">I'm
pretty sure I can get
this approach to work,
however, I'm not
convinced it's really
extensible or
general. Should we
consider using the
full dominator tree in
the inline cost model
to capture this?</span></tt></p>
<p><tt><span
style="font-size:10.0pt">If
you have any thoughts
on how to tackle the
problem, I would love
to hear your feedback!</span></tt></p>
<p><tt><span
style="font-size:10.0pt;color:#888888"> Chad</span></tt><span
style="color:#888888"></span></p>
</div>
</blockquote>
</div>
<p class="MsoNormal"> </p>
</div>
</blockquote>
<p class="MsoNormal"> </p>
</div>
</div>
</div>
<p class="MsoNormal"
style="margin-bottom:12.0pt"><br>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
<a
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a></p>
</blockquote>
</div>
</div>
</div>
<p class="MsoNormal"> </p>
</div>
</div>
</div>
IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged.
If you are not the intended recipient, please notify the
sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the
information in any medium. Thank you.
</div>
<br>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
<a
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>