<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 17, 2016 at 1:04 PM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">chandlerc added a comment.<br>

<br>

Will try to make a full pass through, thanks for the extensive updates Dehao! One specific point of discussion below:<br>

<br>

<br>

<br>

================<br>

Comment at: lib/Transforms/Scalar/<wbr>LoopSink.cpp:148-170<br>

<span class="">+  // For every iteration:<br>

+  //   * Pick the ColdestBB from SortedLoopBBs<br>

+  //   * Find the set BBsDominatedByColdestBB that satisfy:<br>

+  //     - BBsDominatedByColdestBB is a subset of BBsToSinkInto<br>

+  //     - Every BB in BBsDominatedByColdestBB is dominated by ColdestBB<br>

+  //   * If Freq(ColdestBB) < Freq(BBsDominatedByColdestBB), remove<br>

+  //     BBsDominatedByColdestBB from BBsToSinkInto, add ColdestBB to<br>

</span>----------------<br>

danielcdh wrote:<br>

> chandlerc wrote:<br>

> > This seems really expensive. By my reading this is O(N * L * M) where L is the  number of basic blocks in a loop, N is the number of instructions we try to sink into that loop, and M is the number of basic blocks within the loop that use the instructions. If there is for example one hot basic block in the loop and a large number of cold basic blocks and all of the uses are in those cold basic blocks, it seems like this could become quite large.<br>

<span class="">> ><br>

> > Have you looked at other algorithms? Is there a particular reason to go with this one? (I've not thought about the problem very closely yet...)<br>

> I initially started with an adhoc algorithm which is O(L * M), but Danny pointed out it is not optimal, so I changed to this optimal algorithm. The lower bound for any sinking algorithm is O(L*M), but if optimal solution is desired, O(N*L*M) is the best I can get.<br>

><br>

</span><span class="">> Yes, this could be expensive when N is large. I practice, I did not see noticeable compile time increase in speccpu2006 benchmarks after applying this patch (and enable the pass in frontend).<br>

><br>

> How about we limit the N to be no more than a certain number to avoid expensive computation in extreme cases?<br>

</span>I'm not worried about N being large. I'm worried about the fact that L is >= to M and so it is quadratic in the number of basic blocks that use each instruction.<br>

<br>

The other thing is that if this scales in compile time by N then it scales in compile time *by how much effect it is having*. If it scales in compile time by M^2, then we pay more and more compile time as loops get larger even if we only sink very few instructions.<br>

<br>

I would either bound M to a small number, and/or look for some way to not have this be quadratic. It seems like a bunch of this should be pre-computable for the loop?<br></blockquote><div><br></div><div>Updated the patch to add a parameter to control the maximum M.</div><div><br></div><div>Thanks,</div><div>Dehao</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<br>

<a href="https://reviews.llvm.org/D22778" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D22778</a><br>

<br>

<br>

<br>

</blockquote></div><br></div></div>