<div dir="ltr">Hi Hal,<div><br></div><div>Thanks a lot for your review. I totally agree with your review comments, and here are the updated patches.</div><div><br></div><div>Regards,</div><div>Kevin</div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-03-04 11:29 GMT+08:00 Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">----- Original Message -----<br>

> From: "Kevin Qin" <<a href="mailto:kevinqindev@gmail.com">kevinqindev@gmail.com</a>><br>

> To: "llvm-commits" <<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>><br>

> Sent: Friday, February 27, 2015 11:55:05 PM<br>

> Subject: [AArch64] Enable partial unrolling on cortex-a57 and 2 related       improvement<br>

><br>

><br>

><br>

> Hi,<br>

><br>

><br>

> Previously, I made commit r219401 that try to enable partial &<br>

> runtime unrolling on cortex-a57, but I forgot to call base TTI<br>

> implementation in target specific hook, so those unrolling methods<br>

> are not really enabled.<br>

><br>

><br>

> Here are the patch to get them enabled and 2 related patches to<br>

> improve it.<br>

><br>

><br>

> 0001 - Run LICM pass after loop unrolling pass. Runtime unrollng will<br>

> introduce a runtime check in loop prologue(you can treat it as a<br>

> loop preheader). If the unrolled loop is a inner loop, then the<br>

> proglogue will be inside the outer loop. LICM pass can help to<br>

> promote the runtime check out if the checked value is loop<br>

> invariant.<br>

<br>

</span>I think makes sense, at least for LICM, and is consistent with what James observed from the early run of the unroller. Please add a comment explaining why those passes are there. This file does not have many 'rationale' comments, and this is not a good thing. Why are you adding CVP? Can you please add some test cases (we normally don't add tests that runs the full pipeline, but for testing the pipeline, it is a good idea).<br>

<span class=""><br>

><br>

><br>

> 0002 - Introduce runtime unrolling disable matadata and use it to<br>

> mark the scalar loop from vectorization. Runtime unrolling is an<br>

> expensive optimization which can bring benefit only if the loop is<br>

> hot and iteration number is relatively large enough. For some loops,<br>

> we know they are not worth to be runtime unrolled. The scalar loop<br>

> from vectorization is one of the cases.<br>

<br>

</span>I think this is a good idea. However, I think we might be overlooking something. If the purpose of the scalar loop is only to handle the 'left over' part of the iteration space that is not divisible by the vector length. However, if there are runtime safety checks, and those checks generally fail, then the loop could be hot. Can we exclude the case where we've emitted safety checks?<br>

<span class=""><br>

><br>

><br>

> 0003 - Enable partial & runtime unrolling on cortex-a57, and double<br>

> the unrolling threshold if the loop depth > 1. For inner one of<br>

> nested loops, it is more likely to be a hot loop, and the runtime<br>

> check can be promoted out from patch 0001, so the overhead is less,<br>

> we can try a larger threshold to unroll more loops.<br>

><br>

<br>

<br>

</span>+  if (L->getLoopDepth() > 1)<br>

+    UP.PartialThreshold *= 2;<br>

<br>

Please add a comment here.<br>

<br>

 -Hal<br>

<span class=""><br>

><br>

><br>

><br>

> Combined above changes together, we can get below performance and<br>

> code size changes.<br>

><br>

><br>

> Benchmark Execution time code bloat<br>

><br>

><br>

> spec.cpu2000.179_art -16.567% 8.805%<br>

> spec.cpu2000.177_mesa -2.771% 1.912%<br>

> spec.cpu2006.483_xalancbmk -2.555% 0.076%<br>

> spec.cpu2000.256_bzip2 -1.648% 2.414%<br>

> spec.cpu2006.433_milc -1.228% 1.353%<br>

> spec.cpu2006.456_hmmer -1.079% 2.413%<br>

><br>

> spec.cpu2006.462_libquantum 2.492% 1.482%<br>

> spec.cpu2000.253_perlbmk 1.563% 0.464%<br>

> spec.cpu2006.450_soplex 1.379% 1.925%<br>

> spec.cpu2000.186_crafty 1.242% 0.005%<br>

><br>

> spec.geomean -0.546% 0.952%<br>

><br>

><br>

> Please review. Thanks.<br>

><br>

><br>

> --<br>

><br>

><br>

> Best Regards,<br>

><br>

><br>

> Kevin Qin<br>

</span>> _______________________________________________<br>

> llvm-commits mailing list<br>

> <a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

><br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">Best Regards,<div><br></div><div>Kevin Qin</div></div></div>

</div>