<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 03/10/2015 10:14 AM, Diego Novillo
wrote:<br>
</div>
<blockquote
cite="mid:CAD_=9DSzNSqpku0FjpeLyugaxshcxY9fLyyZ5EmD=i5TVDiqRA@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Mar 5, 2015 at 11:29 AM, Bob
Wilson <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:bob.wilson@apple.com" target="_blank"
class="cremed">bob.wilson@apple.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word">
<div>
<div class="h5"><br>
<div>
<blockquote type="cite">
<div>On Mar 2, 2015, at 4:19 PM, Diego Novillo
<<a moz-do-not-send="true"
href="mailto:dnovillo@google.com"
target="_blank" class="cremed">dnovillo@google.com</a>>
wrote:</div>
<br>
<div>
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Thu, Feb 26,
2015 at 6:54 PM, Diego Novillo <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:dnovillo@google.com"
target="_blank" class="cremed">dnovillo@google.com</a>></span>
wrote:</div>
<div class="gmail_quote"><br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr">
<div>I've created a few bugzilla
issues with details of some of the
things I'll be looking into. I'm
not yet done wordsmithing the
overall design document. I'll try
to finish it by early next week at
the latest.</div>
</div>
</blockquote>
<div><br>
</div>
<div>The document is available at</div>
<div><br>
</div>
<div><a moz-do-not-send="true"
href="https://docs.google.com/document/d/15VNiD-TmHqqao_8P-ArIsWj1KdtU-ElLFaYPmZdrDMI/edit?usp=sharing"
target="_blank" class="cremed">https://docs.google.com/document/d/15VNiD-TmHqqao_8P-ArIsWj1KdtU-ElLFaYPmZdrDMI/edit?usp=sharing </a></div>
<div><br>
</div>
<div>There are several topics covered.
Ideally, I would prefer that we
discuss each topic separately. The
main ones I will start working on are
the ones described in the bugzilla
links we have in the doc.</div>
<div><br>
</div>
<div>This is just a starting point for
us. I am not at all concerned with
implementing exactly what is proposed
in the document. In fact, if we can
get the same value using the existing
support, all the better.</div>
<div><br>
</div>
<div>OTOH, any other ideas that folks
may have that work better than this
are more than welcome. I don't have
really strong opinions on the matter.
I am fine with whatever works.</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
<div>Thanks for the detailed write-up on this. Some of
the issues definitely need to be addressed. I am
concerned, though, that some of the ideas may be
leading toward a scenario where we have essentially
two completely different ways of representing profile
information in LLVM IR. It is great to have two
complementary approaches to collecting profile data,
but two representations in the IR would not make
sense.</div>
</div>
</blockquote>
<div><br>
</div>
<div>Yeah, I don't think I'll continue to push for a new
MD_count attribute. If we were to make MD_prof be a "real"
execution count, that would be enough. Note that by
re-using MD_prof we are not changing its meaning at all.
The execution count is still a weight and the ratio is
still branch probability. All that we are changing are the
absolute values of the number and increasing its data type
width to remove the 32bit limitation.</div>
</div>
</div>
</div>
</blockquote>
Independent of everything else, relaxing the 32 bit restriction is
clearly a good idea. This would make a great standalone patch. <br>
<blockquote
cite="mid:CAD_=9DSzNSqpku0FjpeLyugaxshcxY9fLyyZ5EmD=i5TVDiqRA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><span style="font-size:12.8000001907349px"><br>
</span></div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word">
<div><br>
</div>
<div>The first issue raised is that profile execution
counts are not represented in the IR. This was a very
intentional decision. I know it goes against what
other compilers have done in the past. It took me a
while to get used to the idea when Andy first
suggested it, so I know it seems awkward at first. The
advantage is that branch probabilities are much easier
to keep updated in the face of compiler
transformations, compared to execution counts.</div>
</div>
</blockquote>
<div><br>
</div>
<div>Sorry. I don't follow. Updating counts as the CFG is
transformed is not difficult at all. What examples do you
have in mind? The big advantage of making MD_prof an
actual <span style="font-size:12.8000001907349px">execution
count is that it is a meaningful metric wrt scaling and
transformation.</span></div>
<div>
<div><span style="font-size:12.8000001907349px"><br>
</span></div>
<div><span style="font-size:12.8000001907349px">Say, for
instance, that we have a branch instruction with two
targets with counts {100, 300} inside a function 'foo'
that has entry count 2. The edge probability for the
first edge (count 100) is 100/(100+300) = 25%.</span></div>
<div><span style="font-size:12.8000001907349px"><br>
</span></div>
<div><span style="font-size:12.8000001907349px">If we
inline foo() inside another function bar() at a
callsite with profile count == 1, the cloned branch
instruction gets its counters scaled with the callsite
count. So the new branch has counts {100 * 1 / 2, 300
* 1 / 2} = {50, 150}. But the branch probability did
not change. Currently, we are cloning the branch
without changing the edge weights.</span></div>
<div><span style="font-size:12.8000001907349px"><br>
</span></div>
<div><span style="font-size:12.8000001907349px">This
scaling is not difficult at all and can be
incrementally very quickly. We cannot afford to
recompute all frequencies on the fly because it would
be detrimental to compile time. If foo() itself has N
callees inlined into it, each inlined callee needs to
trigger a re-computation. When foo() is inlined into
bar(), the frequencies will need to be recomputed for
foo() and all N callees inlined into foo().</span></div>
</div>
</div>
</div>
</div>
</blockquote>
It really sounds like your proposal is to essentially eagerly
compute scaling rather than lazyily compute it on demand. Assuming
perfect implementations for both (with no rounding losses), the
results should be the same. Is that a correct restatement? I'm
going to hold off on responding to why that's a bad idea until you
confirm, because I'm not sure I follow what you're trying to say.
:)<br>
<br>
Also, trusting exact entry counts is going to be somewhat suspect.
These are *highly* susceptible to racy updates, overflow, etc...
Anything which puts too much implicit trust in these numbers is
going to be problematic. <br>
<blockquote
cite="mid:CAD_=9DSzNSqpku0FjpeLyugaxshcxY9fLyyZ5EmD=i5TVDiqRA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>
<div> <br>
</div>
</div>
<div> <br style="font-size:12.8000001907349px">
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word">
<div> We are definitely missing the per-function
execution counts that are needed to be able to compare
relative “hotness” across functions, and I think that
would be a good place to start making improvements. In
the long term, we should keep our options open to
making major changes, but before we go there, we
should try to make incremental improvements to fix the
existing infrastructure.</div>
</div>
</blockquote>
<div><br>
</div>
<div>Right, and that's the core of our proposal. We don't
really want to make major infrastructure changes at this
point. The only thing I'd like to explore is making
MD_prof a real count. This will be useful for the inliner
changes and it would also make incremental updates easier,
because the scaling that needs to be done is very
straightforward and quick.</div>
<div><br>
</div>
<div>Note that this change should not modify the current
behaviour we get from profile analysis. Things that were
hot before should continue to be hot now.</div>
</div>
</div>
</div>
</blockquote>
I have no objection to adding a mechanism for expressing an entry
count. I am still very hesitant about the proposals with regards to
redefining the current MD_prof. <br>
<br>
I'd encourage you to post a patch for the entry count mechanism, but
not tie its semantics to exact execution count. (Something like
"the value provided must correctly describe the relative hotness of
this routine against others in the program annoatated with the same
metadata. It is the relative scaling that is important, not the
absolute value. In particular, the value need not be an exact
execution count.")<br>
<blockquote
cite="mid:CAD_=9DSzNSqpku0FjpeLyugaxshcxY9fLyyZ5EmD=i5TVDiqRA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word">
<div><br>
</div>
<div>Many of the other issues you raise seem like they
could also be addressed without major changes to the
existing infrastructure. Let’s try to fix those first.</div>
</div>
</blockquote>
</div>
<br>
</div>
<div class="gmail_extra">That's exactly the point of the
proposal. We definitely don't want to make major changes to
the infrastructure at first. My thinking is to start working
on making MD_prof a real count. One of the things that are
happening is that the combination of real profile data plus
the frequency propagation that we are currently doing is
misleading the analysis.</div>
</div>
</blockquote>
I consider this a major change. You're trying to redefine a major
part of the current system. <br>
<br>
Multiple people have spoken up and objected to this change (as
currently described). Please start somewhere else. <br>
<blockquote
cite="mid:CAD_=9DSzNSqpku0FjpeLyugaxshcxY9fLyyZ5EmD=i5TVDiqRA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">For example (thanks David for the code
and data). In the following code:</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">
<div class="gmail_extra"><font face="monospace, monospace">int
g;</font></div>
<div class="gmail_extra"><font face="monospace, monospace">__attribute__((noinline))
void bar() {</font></div>
<div class="gmail_extra"><font face="monospace, monospace"> g++;</font></div>
<div class="gmail_extra"><font face="monospace, monospace">}</font></div>
<div class="gmail_extra"><font face="monospace, monospace"><br>
</font></div>
<div class="gmail_extra"><font face="monospace, monospace">extern
int printf(const char*, ...);</font></div>
<div class="gmail_extra"><font face="monospace, monospace"><br>
</font></div>
<div class="gmail_extra"><font face="monospace, monospace">int
main()</font></div>
<div class="gmail_extra"><font face="monospace, monospace">{</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
int i, j, k;</font></div>
<div class="gmail_extra"><font face="monospace, monospace"><br>
</font></div>
<div class="gmail_extra"><font face="monospace, monospace"> g
= 0;</font></div>
<div class="gmail_extra"><font face="monospace, monospace"><br>
</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
// Loop 1.</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
for (i = 0; i < 100; i++)</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
for (j = 0; j < 100; j++)</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
for (k = 0; k < 100; k++)</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
bar();</font></div>
<div class="gmail_extra"><font face="monospace, monospace"><br>
</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
printf ("g = %d\n", g);</font></div>
<div class="gmail_extra"><font face="monospace, monospace"> g
= 0;</font></div>
<div class="gmail_extra"><font face="monospace, monospace"><br>
</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
// Loop 2.</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
for (i = 0; i < 100; i++)</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
for (j = 0; j < 10000; j++)</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
bar();</font></div>
<div class="gmail_extra"><font face="monospace, monospace"><br>
</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
printf ("g = %d\n", g);</font></div>
<div class="gmail_extra"><font face="monospace, monospace"> g
= 0;</font></div>
<div class="gmail_extra"><font face="monospace, monospace"><br>
</font></div>
<div class="gmail_extra"><font face="monospace, monospace"><br>
</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
// Loop 3.</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
for (i = 0; i < 1000000; i++)</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
bar();</font></div>
<div class="gmail_extra"><font face="monospace, monospace"><br>
</font></div>
<div class="gmail_extra"><font face="monospace, monospace">
printf ("g = %d\n", g);</font></div>
<div class="gmail_extra"><font face="monospace, monospace"> g
= 0;</font></div>
<div class="gmail_extra"><font face="monospace, monospace">}</font></div>
<div class="gmail_extra"><font face="monospace, monospace"><br>
</font></div>
<div class="gmail_extra"><font face="arial, helvetica,
sans-serif">When compiled with profile instrumentation,
frequency propagation is distorting the real profile
because it gives different frequency to the calls to bar()
in the 3 different loops. All 3 loops execute 1,000,000
times, but after frequency propagation, the first call to
bar() gets a weight of 520,202 in loop #1, 210,944 in
loop #2 and 4,096 in loop #3. In reality, every call to
bar() should have a weight of 1,000,000.</font></div>
</div>
</div>
</blockquote>
<font face="arial, helvetica, sans-serif">Duncan responded to this.
My conclusion from his response: this is a bug, not a fundamental
issue. Remove the max scaling factor, switch the counts to 64
bits and everything should be fine. If you disagree, let's
discuss. </font><br>
<blockquote
cite="mid:CAD_=9DSzNSqpku0FjpeLyugaxshcxY9fLyyZ5EmD=i5TVDiqRA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_extra"><font face="arial, helvetica,
sans-serif"><br>
</font></div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">Thanks. Diego.</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a class="moz-txt-link-freetext" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a>
<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>
</pre>
</blockquote>
<br>
</body>
</html>