<div dir="ltr">Thanks for the analysis David, more inline.<br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 26, 2016 at 4:43 PM, Xinliang David Li <span dir="ltr"><<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Wed, Oct 26, 2016 at 3:03 PM, Michael Kuperstein <span dir="ltr"><<a href="mailto:mkuper@google.com" target="_blank">mkuper@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Wed, Oct 26, 2016 at 1:09 PM, David Li <span dir="ltr"><<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">davidxl added inline comments.<br>
<span><br>
<br>
================<br>
Comment at: lib/Transforms/Utils/LoopUnrol<wbr>lPeel.cpp:101<br>
+ // We no longer know anything about the branch probability.<br>
+ LatchBR->setMetadata(LLVMConte<wbr>xt::MD_prof, nullptr);<br>
+ }<br>
----------------<br>
</span><span>mkuper wrote:<br>
> davidxl wrote:<br>
> > Why? I think we should update the branch probability here -- it depends on the what iteration of the peeled clone. If peel count < average/estimated trip count, then each peeled iteration should be more biased towards fall through. If peel_count == est trip_count, then the last peel iteration should be biased toward exit.<br>
> You're right, it's not that we don't know anything - but we don't know enough. I'm not sure how to attach a reasonable number to this, without knowing the distribution.<br>
> Do you have any suggestions? The trivial option would be to assume an extremely narrow distribution (the loop always exits after exactly K iterations), but that would mean having an extreme bias for all of the branches, and I'm not sure that's wise.<br>
</span>A reasonable way to annotate the branch is like this.<br>
Say the original trip count of the loop is N, then for the m th (from 0 to N-1) peeled iteration, the fall through probability is a decreasing function:<br>
<br>
(N - m )/N<br>
<br></blockquote><div><br></div></span><div>I'm not entirely sure the math works out - because N is the average</div></div></div></div></blockquote><div><br></div></span><div>Yes -- N is the average -- but this is due to limitation of PGO. To get trip count distribution, we need to do value profiling of loop trip count or have path sensitive profile. This is future work. For now we need to focus on what we have with good heuristics. </div><div><br></div></div></div></div></blockquote><div> </div><div><div>Sure, didn't mean top imply otherwise. </div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div>With current PGO, the back branch probability is already estimated to be N/(N+1) which can be inaccurate depending on trip count distribution.</div><span><div><br></div></span></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> the newly assigned weights ought to have the property that the total probability of reaching the loop header is 0.5 -</div></div></div></div></blockquote><div><br></div><div><br></div></span><div>Why should this constraints exist? The constraints that should be satisfied are 1) the total frequency of the loop exit remain unchanged; 2) the total header (including cloned ones) frequency equals the original header frequency 3) the header frequency of the first peeled iteration equals to the original preheader frequency</div><div><br></div></div></div></div></blockquote><div><br></div><div>You're right, the constraint I suggested is nonsense, it's really distribution-dependent.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><br></div><div> The original conditional branch (for loop back edge) have one shared/'average' branch probability for iterations. Once the branch is cloned via peeling, more context (temporal) information is available, the conditional branch probabilities of those cloned branches can be refined -- the intuition is that the closer the iteration is to the end of the loop, the more likely it is branch to exit.</div></div></div></div></blockquote><div><br></div><div>What bothers me somewhat is that this doesn't hold in general.</div><div>It does hold if the distribution is more or less normal around the average - early iterations usually fall through, but the closer we get to N, the more often we exit.</div><div>The problem is that it doesn't hold for long-tail distributions (or anything else that is biased towards low counts) - which may also be common.</div><div><br></div><div>Consider:</div><div>1 Iterations - 128</div><div>2 Iterations - 64</div><div>3 Iterations - 32,</div><div>etc.<br></div><div><br></div><div>In this case, the fall-through probability is always 0.5, regardless of iteration number.</div><div>I'm really not sure which of the two cases is more common/natural.<br></div><div><br></div><div>Anyway, as I said, I have no real intuition here, so I'm ok with doing it either way.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> and I don't think that happens here.</div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>This also doesn't solve the problem of what probability to assign to the loop backedge - if K is the random variable signifying the number of iterations, I think it should be something like 1/(E[K | K > E[K]] - E[K]).</div><div>That is, it depends on the expected number of iterations given that we have more iterations than average. Which we don't know, and we can't even bound.<br></div><div>E.g. imagine that we have a loop that runs for 1 iteration for a million times, and a million iterations once. The average number of iterations is 2, but the probability of taking the backedge, once you've reached the loop, is extremely high.</div></div></div></div></blockquote><div><br></div></span><div>We just need to update the existing branch weights data slightly. Ideally, we can first assign branch probabilities for conditional branches of cloned iterations, and then using the constraints I mentioned above to adjust the weight. However I think it can be simplified as follows:</div><div><br></div><div>Suppose the branch weight vector is (WB, WE) where WB is the weight of edge to loop header, and WE is weight of edge to exit block, then the new weight can be something like (WB - m*WE, WE) where m is the number of peeled iterations.</div><div><br></div></div></div></div></blockquote><div><br></div><div>I don't think we really need this simplification - it sounds pretty straight-forward to track the weights while assigning them to the peeled branches.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div>[Proof]. Assuming the fall_through probabilities of i th cloned cond branch is P_i. The weight header of the first cloned iteration is WE, then the total edge weight from cloned iteration to the exit block is</div><div><br></div><div> WE *(1-P_1)*(1-P_2)...(1-P_m). </div><div><br></div><div>so the new exit edge weight of the remaining loop is</div><div><br></div><div>WE * ( 1 - (1-P_1)*....(1-P_m))</div><div><br></div><div>Assuming P_i is close to 1, this approximates to WE.</div><div><br></div></div></div></div></blockquote><div> </div><div>Even for the narrow normal distribution case, at least P_m won't be close to 1. I'm not sure this matters in practice - but since tracking the weights doesn't sound hard, I'll try that first.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div>Similarly, the new header weight of the loop is about (WB - m*WE)</div><span><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>We could assume something like a uniform distribution between, say, 0 and 2 * N iterations (in which case the fall-through probability is, I think (2 * N - m - 1) / (2 * N), and the backedge probability is something like 1 - 1/(1.5 * N) ) - but I don't know if that's realistic either.</div></div></div></div></blockquote><div><br></div></span><div>I am not sure making such assumption about distribution is a reasonable thing to do. I think it is more reasonable to assume more narrow distribution and adjust the weight in a simple way (we are not doing anything worse than is already happening today). </div><span class="gmail-m_-5055347745365045474m_4426064162476983570m_-6878125144194299342HOEnZb"><font color="#888888"><div><br></div><div>David</div></font></span><span><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Add some fuzzing factor to avoid creating extremely biased branch prob:<br>
<br>
for instance (N-m)*3/(4*N)<br>
<br>
<br>
<a href="https://reviews.llvm.org/D25963" rel="noreferrer" target="_blank">https://reviews.llvm.org/D2596<wbr>3</a><br>
<br>
<br>
<br>
</blockquote></span></div><br></div></div>
</blockquote></span></div><br></div></div>
</blockquote></div><br></div></div>