<div dir="ltr"><div class="gmail_default" style="font-size:small">Hi cheny@,</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">Previously you wrote (in response to Duncan I think)</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small"><i>  The @nonentry_header case in irreducible.ll shows the difference between old and new algorithm. Bottom block should be cold but old algorithm think it is hot because it ignores nested scc-loop.</i></div><div class="gmail_default" style="font-size:small"><i><br></i></div><div class="gmail_default"><div class="gmail_default">I looked over the frequencies being computed for this function and I'm not totally clear on why "bottom" should be considered cold and a node like "right" should be considered hot. Both are involved in a loop of some sort, and at least on the surface it seems that whether the loop is considered innermost is really just an artifact of how the loop SCCs are constructed.</div><div class="gmail_default"><br></div><div class="gmail_default">I dumped out DOT graphs for new and old implementations for this example (attached), and it seems that there are still some major "flow insanities" in both cases. Here is old:</div><div class="gmail_default"><br></div><div class="gmail_default"> <img src="cid:ii_io0apekh0_154969c617558224" width="562" height="456"><br></div><div style="font-size:small"><br></div></div><div><div class="gmail_default" style="font-size:small">and here is new:</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small"><img src="cid:ii_io0aroa41_154969dff3df1c67" width="562" height="454"><br><br></div><div class="gmail_default" style="font-size:small">Hard to see from a high level why this is an improvement.</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">Thanks, Than</div><div class="gmail_default" style="font-size:small"></div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Apr 21, 2016 at 11:15 AM, Yuanfang Chen via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Wed, Apr 20, 2016 at 3:27 PM, Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span><br>

> On 2016-Apr-12, at 21:23, Yuanfang Chen <<a href="mailto:cheny@udel.edu" target="_blank">cheny@udel.edu</a>> wrote:<br>

><br>

> kula created this revision.<br>

> kula added reviewers: dexonsmith, dnovillo.<br>

> kula added a subscriber: llvm-commits.<br>

><br>

> This patch implements the TODOs in BlockFrequencyInfoImpl.h.<br>

<br>

</span>Great!  Thanks for working on this.<br>

<br>

Do you have a concrete testcase where this is valuable?<br></blockquote><div><br></div></span><div>I don't have a real world test cases for this. New algorithm has better precision for cases that old algorithm can handle.  New algorithm handle nested scc-loop with complexity depending on the specific CFG. Simple CFG does not incur high complexity. Old algorithm gives wrong number for nested scc-loop cases.</div><div><br></div><div><span style="font-size:12.8px">The @nonentry_header case in irreducible.ll shows the difference between old and new algorithm. Bottom block should be cold but old </span>algorithm think it is hot because it ignores nested scc-loop.<br></div><span class=""><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<span><br>

> 1. handle nested irreducible loop.<br>

> 2. true weight distribution among multi-heads of irreducible loop, instead of even split + adjustment by backedge later.<br>

> 3. Use geometric series instead of loop scale for mass iteration. loop scale are still used, but for the purpose of scaling down local mass below 1.0.<br>

<br>

</span>The patch is pretty huge.  Is it possible to stage these changes somehow to make it easier to review?<br>

<br>

It's going to take some time for me to page this algorithm back in.  The smaller each change is, the easier it'll be for me ;).<br>

<br>

I haven't looked in detail at the patch yet, but just glancing I saw a few smaller NFC changes (like DEBUG output and variable names) that would should definitely be split out.</blockquote><div><br></div></span><div>Done.</div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span><br>

> 4. normal code path for reducible loop is not affected.<br>

><br>

> This patch passes all regression test. Several test cases are changed accordingly. I've changed the @nonentry_header case in irreducible.ll to use<br>

> huge weight on the switch instruction to show that 'bottom' block should not be hot.<br>

><br>

> Method:<br>

> 1. for loop a, found all SCCs, create SCC loops and adjust nodes in its parent node list accordingly.<br>

> 2. In topological order, propagate mass on all SCCs where non-trivial SCCs incur resursion.<br>

> 3. compute start term of geometric series.<br>

> 4. for each header, compute its ratio of geometric series by propogation full mass starting from itself and other block hass empty mass. The cumulated backege mass is ratio.<br>

> 5. find the max ratio among headers.<br>

> 6. compute local scaled down mass for all headers with geometric series equation [a/(1-r)]. assume n is infinity. I've add a TODO in file to see if we should use [a*(1-r^n)/(1-r)].<br>

> 7. Propagate with header mass to other blocks in loop.<br>

<br>

</span>What's the worst-case complexity?  How does it compare to the old algorithm?<span><br></span></blockquote><div><br></div></span><div>N: number of blocks in top level irreducible CFG</div><div>A: number of header in irreducible CFG regardless of scc-loop level.</div><div><br></div><div>The major time difference between old and new algorithm is contributed by step 4 (compute geometric series ratio), where for each header, propagate on all blocks of that specific loop.</div><div>Worst-case quadratic when A==N.  Generally O(AN). Common case, A=2. Old algorithm is O(N).</div><span class=""><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span>

><br>

> <a href="http://reviews.llvm.org/D19049" rel="noreferrer" target="_blank">http://reviews.llvm.org/D19049</a><br>

><br>

> Files:<br>

>  include/llvm/Analysis/BlockFrequencyInfoImpl.h<br>

>  lib/Analysis/BlockFrequencyInfoImpl.cpp<br>

>  test/Analysis/BlockFrequencyInfo/irreducible.ll<br>

><br>

</span>> <D19049.53516.patch><br>

<br>

</blockquote></span></div><br></div></div>

<br>_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>

<br></blockquote></div><br></div>