<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 5, 2015 at 11:29 AM, Bob Wilson <span dir="ltr"><<a href="mailto:bob.wilson@apple.com" target="_blank" class="cremed">bob.wilson@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div class="h5"><br><div><blockquote type="cite"><div>On Mar 2, 2015, at 4:19 PM, Diego Novillo <<a href="mailto:dnovillo@google.com" target="_blank" class="cremed">dnovillo@google.com</a>> wrote:</div><br><div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Feb 26, 2015 at 6:54 PM, Diego Novillo <span dir="ltr"><<a href="mailto:dnovillo@google.com" target="_blank" class="cremed">dnovillo@google.com</a>></span> wrote:</div><div class="gmail_quote"><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div>I've created a few bugzilla issues with details of some of the things I'll be looking into. I'm not yet done wordsmithing the overall design document. I'll try to finish it by early next week at the latest.</div></div></blockquote><div><br></div><div>The document is available at</div><div><br></div><div><a href="https://docs.google.com/document/d/15VNiD-TmHqqao_8P-ArIsWj1KdtU-ElLFaYPmZdrDMI/edit?usp=sharing" target="_blank" class="cremed">https://docs.google.com/document/d/15VNiD-TmHqqao_8P-ArIsWj1KdtU-ElLFaYPmZdrDMI/edit?usp=sharing </a></div><div><br></div><div>There are several topics covered. Ideally, I would prefer that we discuss each topic separately. The main ones I will start working on are the ones described in the bugzilla links we have in the doc.</div><div><br></div><div>This is just a starting point for us. I am not at all concerned with implementing exactly what is proposed in the document. In fact, if we can get the same value using the existing support, all the better.</div><div><br></div><div>OTOH, any other ideas that folks may have that work better than this are more than welcome. I don't have really strong opinions on the matter. I am fine with whatever works.</div></div></div></div></div></blockquote><br></div></div></div><div>Thanks for the detailed write-up on this. Some of the issues definitely need to be addressed. I am concerned, though, that some of the ideas may be leading toward a scenario where we have essentially two completely different ways of representing profile information in LLVM IR. It is great to have two complementary approaches to collecting profile data, but two representations in the IR would not make sense.</div></div></blockquote><div><br></div><div>Yeah, I don't think I'll continue to push for a new MD_count attribute. If we were to make MD_prof be a "real" execution count, that would be enough. Note that by re-using MD_prof we are not changing its meaning at all. The execution count is still a weight and the ratio is still branch probability. All that we are changing are the absolute values of the number and increasing its data type width to remove the 32bit limitation.</div><div><span style="font-size:12.8000001907349px"><br></span></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>The first issue raised is that profile execution counts are not represented in the IR. This was a very intentional decision. I know it goes against what other compilers have done in the past. It took me a while to get used to the idea when Andy first suggested it, so I know it seems awkward at first. The advantage is that branch probabilities are much easier to keep updated in the face of compiler transformations, compared to execution counts.</div></div></blockquote><div><br></div><div>Sorry. I don't follow. Updating counts as the CFG is transformed is not difficult at all. What examples do you have in mind?  The big advantage of making MD_prof an actual <span style="font-size:12.8000001907349px">execution count is that it is a meaningful metric wrt scaling and transformation.</span></div><div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">Say, for instance, that we have a branch instruction with two targets with counts {100, 300} inside a function 'foo' that has entry count 2. The edge probability for the first edge (count 100) is 100/(100+300) = 25%.</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">If we inline foo() inside another function bar() at a callsite with profile count == 1, the cloned branch instruction gets its counters scaled with the callsite count. So the new branch has counts {100 * 1 / 2, 300 * 1 / 2} = {50, 150}.  But the branch probability did not change. Currently, we are cloning the branch without changing the edge weights.</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">This scaling is not difficult at all and can be incrementally very quickly. We cannot afford to recompute all frequencies on the fly because it would be detrimental to compile time. If foo() itself has N callees inlined into it, each inlined callee needs to trigger a re-computation. When foo() is inlined into bar(), the frequencies will need to be recomputed for foo() and all N callees inlined into foo().</span></div><div> <br></div></div><div> <br style="font-size:12.8000001907349px"></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div> We are definitely missing the per-function execution counts that are needed to be able to compare relative “hotness” across functions, and I think that would be a good place to start making improvements. In the long term, we should keep our options open to making major changes, but before we go there, we should try to make incremental improvements to fix the existing infrastructure.</div></div></blockquote><div><br></div><div>Right, and that's the core of our proposal. We don't really want to make major infrastructure changes at this point. The only thing I'd like to explore is making MD_prof a real count. This will be useful for the inliner changes and it would also make incremental updates easier, because the scaling that needs to be done is very straightforward and quick.</div><div><br></div><div>Note that this change should not modify the current behaviour we get from profile analysis. Things that were hot before should continue to be hot now.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>Many of the other issues you raise seem like they could also be addressed without major changes to the existing infrastructure. Let’s try to fix those first.</div></div></blockquote></div><br></div><div class="gmail_extra">That's exactly the point of the proposal.  We definitely don't want to make major changes to the infrastructure at first. My thinking is to start working on making MD_prof a real count. One of the things that are happening is that the combination of real profile data plus the frequency propagation that we are currently doing is misleading the analysis.</div><div class="gmail_extra"><br></div><div class="gmail_extra">For example (thanks David for the code and data). In the following code:</div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra"><font face="monospace, monospace">int g;</font></div><div class="gmail_extra"><font face="monospace, monospace">__attribute__((noinline)) void bar() {</font></div><div class="gmail_extra"><font face="monospace, monospace"> g++;</font></div><div class="gmail_extra"><font face="monospace, monospace">}</font></div><div class="gmail_extra"><font face="monospace, monospace"><br></font></div><div class="gmail_extra"><font face="monospace, monospace">extern int printf(const char*, ...);</font></div><div class="gmail_extra"><font face="monospace, monospace"><br></font></div><div class="gmail_extra"><font face="monospace, monospace">int main()</font></div><div class="gmail_extra"><font face="monospace, monospace">{</font></div><div class="gmail_extra"><font face="monospace, monospace">  int i, j, k;</font></div><div class="gmail_extra"><font face="monospace, monospace"><br></font></div><div class="gmail_extra"><font face="monospace, monospace">  g = 0;</font></div><div class="gmail_extra"><font face="monospace, monospace"><br></font></div><div class="gmail_extra"><font face="monospace, monospace">  // Loop 1.</font></div><div class="gmail_extra"><font face="monospace, monospace">  for (i = 0; i < 100; i++)</font></div><div class="gmail_extra"><font face="monospace, monospace">    for (j = 0; j < 100; j++)</font></div><div class="gmail_extra"><font face="monospace, monospace">       for (k = 0; k < 100; k++)</font></div><div class="gmail_extra"><font face="monospace, monospace">           bar();</font></div><div class="gmail_extra"><font face="monospace, monospace"><br></font></div><div class="gmail_extra"><font face="monospace, monospace">  printf ("g = %d\n", g);</font></div><div class="gmail_extra"><font face="monospace, monospace">  g = 0;</font></div><div class="gmail_extra"><font face="monospace, monospace"><br></font></div><div class="gmail_extra"><font face="monospace, monospace">  // Loop 2.</font></div><div class="gmail_extra"><font face="monospace, monospace">  for (i = 0; i < 100; i++)</font></div><div class="gmail_extra"><font face="monospace, monospace">    for (j = 0; j < 10000; j++)</font></div><div class="gmail_extra"><font face="monospace, monospace">        bar();</font></div><div class="gmail_extra"><font face="monospace, monospace"><br></font></div><div class="gmail_extra"><font face="monospace, monospace">  printf ("g = %d\n", g);</font></div><div class="gmail_extra"><font face="monospace, monospace">  g = 0;</font></div><div class="gmail_extra"><font face="monospace, monospace"><br></font></div><div class="gmail_extra"><font face="monospace, monospace"><br></font></div><div class="gmail_extra"><font face="monospace, monospace">  // Loop 3.</font></div><div class="gmail_extra"><font face="monospace, monospace">  for (i = 0; i < 1000000; i++)</font></div><div class="gmail_extra"><font face="monospace, monospace">    bar();</font></div><div class="gmail_extra"><font face="monospace, monospace"><br></font></div><div class="gmail_extra"><font face="monospace, monospace">  printf ("g = %d\n", g);</font></div><div class="gmail_extra"><font face="monospace, monospace">  g = 0;</font></div><div class="gmail_extra"><font face="monospace, monospace">}</font></div><div class="gmail_extra"><font face="monospace, monospace"><br></font></div><div class="gmail_extra"><font face="arial, helvetica, sans-serif">When compiled with profile instrumentation, frequency propagation is distorting the real profile because it gives different frequency to the calls to bar() in the 3 different loops. All 3 loops execute 1,000,000 times, but after frequency propagation, the first call to bar() gets a weight of 520,202 in loop #1, 210,944 in  loop #2 and 4,096 in loop #3. In reality, every call to bar() should have a weight of 1,000,000.</font></div><div class="gmail_extra"><font face="arial, helvetica, sans-serif"><br></font></div><div class="gmail_extra"><br></div><div class="gmail_extra">Thanks.  Diego.</div></div></div>