<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 11, 2016 at 7:19 PM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span class=""><div dir="ltr">On Mon, Jul 11, 2016 at 7:12 PM Xinliang David Li <<a href="mailto:xinliangli@gmail.com" target="_blank">xinliangli@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jul 11, 2016 at 6:55 PM, Chandler Carruth via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">chandlerc added inline comments.<br>

<span><br>

================<br>

Comment at: lib/CodeGen/BranchFolding.cpp:656<br>

@@ +655,3 @@<br>

+  // duplicated.<br>

+  if (!AfterPlacement)<br>

+    if (SuccBB && MBB1 != PredBB && MBB2 != PredBB &&<br>

----------------<br>

</span><span>davidxl wrote:<br>

> iteratee wrote:<br>

> > Discussed this with chandlerc. He thinks that passing around the list of blocks that were tail-duplicated too tightly couples the passes, and that we should strive to have non-overlap. He suggested that a list of blocks that were duplicated might be a good debug measure to make sure they aren't being re-merged.<br>

> ><br>

> > Overall I agree with him, and I made a change so there shouldn't be any overlap.<br>

> > Do you want to see the set as a debugging measure?<br>

> I don't agree. Without making it explicit, the the pass coupling does not disappear -- it is still there but in a more implicit but hard to maintain way.<br>

><br>

> For instance, can we guarantee this still work when TailMerger and TailDuplicator uses non-default size thresholds?<br>

</span>My thought was specifically that we should enforce an invariant that these are defined in a non-overlapping way, *and* check it in asserts builds precisely so that things like non-default size thresholds work in a sane way.<br>

<br></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>The problem is that the implementation maintains the non-overlapping by making assumption about the size limit (the default setting), so when non-default size is used, the assertion will likely to fire off.</div></div></div></div></blockquote><div><br></div></span><div>Having two thresholds like this that are coupled is definitely problematic, but it seems like the solution should be to have a single threshold rather than two.</div><div><br></div><div>Put differently, whether we are merging or duplicating tails, we should have one threshold and maybe a tolerance that provides any desired padding between the two operations. That way it will be impossible to configure this in an overlapping manner.</div></div></div></blockquote><div><br></div><div>Having one threshold here is not the right thing to do for a couple of reasons:</div><div><br></div><div>1) TailMerging's main purpose is to reduce code size. Its threshold is a lower bound -- i.e., it is worthwhile to do tail merging only when the common sequence length is >= threshold.   TailMerge is also possible to have negative performance impact (increased taken branches, worse branch prediction),  so the threshold is set to 3 as it is turned on at O2.  For Os and Oz a smaller value is possible.</div><div> </div><div>2) TailDup's main purpose, on the other hand, is to improve performance. The size needs to be <= than the threshold in order for taildup to happen to avoid excessive size increase. However, under different options with different size/performance trade-offs, the setting is different. For instance, its current default is set at 2, while at Os, the threshold is hard coded to be 1.   In fact, to improve branch prediction, the threshold is greatly increased to 20 when indirect branch is involved.</div><div><br></div><div>Currently, <=2 (tailDup) and >=3 (tailMerge) happen to not overlapping for majority of cases (Kyle's handling is only for special cases), but I can easily see the big benefit of increasing tailDup default threshold to be a larger value, at least for O3, or hot functions with PGO.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

As long as there is the potential for overlap, we have the problem (IMO) that the innards of tail dup or tail merge can do things that user programs cannot. I would much prefer that we have a set of rules that are sufficiently precise and enforced that one pass doesn't have to have a back-channel to communicate to the other pass.<br>

<br></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Passing a 'filter' is actually the exact mechanism IMO to hide pass implementation details from one another. Is there a better way? Making any assumption about how tailDup is done in tailMerger is the wrong direction to take.</div></div></div></div></blockquote><div><br></div></span><div>This assumes that all duplication of tail code sequences is done by a particular part of the compiler. The input program may have manually done such duplication itself. Same story for merging.</div><div><br></div><div>The behavior shouldn't change when you manually apply a transform to the source program rather than having LLVM apply it for you.</div><div><br></div><div>For that to work, we can't rely on communicating between the two systems, we need to actually define the operations in non-overlapping ways. We do the exact same thing for many other code motion transforms where we will transform in both directions to try and converge in the middle.</div><div><br></div><div><br></div><div>But all of this is not to say that the problem of having two thresholds with a subtle coupling isn't a real problem! It is. We should define the interfaces in a way that clearly surfaces that there is some point above which we will merge and below which we will duplicate and that it is one point (not two) with maybe some delta around the point where we will make no change at all.</div></div></div></blockquote><div><br></div><div><br></div><div>There are also likely cases where TailMerging purely reduces size without affecting performance. One way is to teach tailMerger to be 'smart' about it (instead of just doing simple size benefit as it is done today). However I think doing that is not the right way to go.</div><div><br></div><div>To put it another way, TailMerging can be considered a 'normalization' pass that simply merges all possible common tail sequences (with benefit, so it should have a low threshold).  Later, tailDup will come in to undo some of the decisions made earlier for performance -- so the threshold needs to be opt level/mode dependent. Post layout tailMerge simply needs to honor decisions made by TailDup as it has no way of precise cost analysis.</div><div><br></div><div>David</div><div><br></div><div> </div></div><br></div></div>