<div dir="ltr">Hi Hal,<div><br></div><div>For the first patch, I've added comments to explain why we run LICM pass after loop unrolling pass, and added a test to check if the runtime unrolling prologue  is promoted out by LICM at -O2. Can you point me which part is not sufficient? For adding a run of <span style="font-size:13.63636302948px">CorrelatedValuePropagation, it's because I found that LICM had dependence on it. If I run LICM only after loop unrolling, llvm will crash with:</span></div><div><span style="font-size:13.63636302948px"><br></span></div><div><span style="font-size:13.63636302948px">llvm/lib/Transforms/Scalar/LICM.cpp:196: virtual bool <anonymous namespace>::LICM::runOnLoop(llvm::Loop *, llvm::LPPassManager &): Assertion `InnerAST && "Where is my AST?"' failed.</span><br></div><div><span style="font-size:13.63636302948px"><br></span></div><div><span style="font-size:13.63636302948px">For the second patch becoming huge, it's caused by moving class. The problem is like,</span></div><div><span style="font-size:13.63636302948px"><br></span></div><div><div><span style="font-size:13.63636302948px">class   B;</span></div><div><span style="font-size:13.63636302948px"><br></span></div><div><span style="font-size:13.63636302948px">class   A {       </span></div><div><span style="font-size:13.63636302948px">   void   doSomething(B * _b) { //This function is newly added by this patch.</span></div><div><span style="font-size:13.63636302948px">      _b->add();</span></div><div><span style="font-size:13.63636302948px">   }</span></div><div><span style="font-size:13.63636302948px">};</span></div><div><span style="font-size:13.63636302948px"><br></span></div><div><span style="font-size:13.63636302948px">class   B {</span></div><div><span style="font-size:13.63636302948px"><br></span></div><div><span style="font-size:13.63636302948px">   void   add() {</span></div><div><span style="font-size:13.63636302948px">      ...</span></div><div><span style="font-size:13.63636302948px">   }</span></div><div><span style="font-size:13.63636302948px"><br></span></div><div><span style="font-size:13.63636302948px">};</span></div></div><div><span style="font-size:13.63636302948px"><br></span></div><div><span style="font-size:13.63636302948px">Above code can't be compiled with error: </span><span style="font-size:13.63636302948px">member access into incomplete type 'B' . </span><span style="font-size:13.63636302948px">So I moved class B in front of A.</span></div><div><span style="font-size:13.63636302948px"><br></span></div><div><span style="font-size:13.63636302948px">The meaningful changes comparing to last edition are,</span></div><div><span style="font-size:13.63636302948px"><br></span></div><div><div style><span style="font-size:13.63636302948px">class InnerLoopVectorizer {</span><br></div></div><div style><span style="font-size:13.63636302948px"><div> public:</div></span></div><div style><div style="font-size:13.63636302948px">+  // Whether runtime check about strides is added.</div><div style="font-size:13.63636302948px">+  bool IsCheckStrides() {</div><div style="font-size:13.63636302948px">+    return Legal->mustCheckStrides();</div><div style="font-size:13.63636302948px">+  }</div><div style><div style><span style="font-size:13.63636302948px">+  // Whether runtime check about memory is added.</span></div><div style><span style="font-size:13.63636302948px">+  bool IsCheckMemory() {</span></div><div style><span style="font-size:13.63636302948px">+    return Legal->getLAI()->getRuntimePointerCheck()->Need;</span></div><div style><span style="font-size:13.63636302948px">+  }</span></div><div style><span style="font-size:13.63636302948px">}</span></div><div style><span style="font-size:13.63636302948px"><br></span></div><div style><div><span style="font-size:13.63636302948px">+      // Add metadata to disable runtime unrolling scalar loop when there's no</span></div><div><span style="font-size:13.63636302948px">+      // runtime check about strides and memory. Because at this situation,</span></div><div><span style="font-size:13.63636302948px">+      // scalar loop is rarely used and not worthy to be unrolled.</span></div><div><span style="font-size:13.63636302948px">+      if (!LB.IsCheckStrides() && !LB.IsCheckMemory())</span></div><div><span style="font-size:13.63636302948px">+        AddRuntimeUnrollDisableMetaData(L);</span></div><div><span style="font-size:13.63636302948px"><br></span></div><div><span style="font-size:13.63636302948px">I'm sorry for generating such a huge patch and bring difficulty for code review. Above information can help you to understand it a bit easier.</span></div><div><span style="font-size:13.63636302948px"><br></span></div><div><span style="font-size:13.63636302948px">Thanks,</span></div><div><span style="font-size:13.63636302948px">Kevin</span></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-03-05 0:43 GMT+08:00 Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Kevin,<br>

<br>

Regarding the first patch, you did not comment on (nor add a test for?) also adding a run of CorrelatedValuePropagation. Can you please explain the rationale?<br>

<br>

The vectorizer/unrolling patch is now huge. What happened?<br>

<br>

Regarding the third, this LGTM.<br>

<span class="HOEnZb"><font color="#888888"><br>

 -Hal<br>

</font></span><span class="im HOEnZb"><br>

----- Original Message -----<br>

> From: "Kevin Qin" <<a href="mailto:kevinqindev@gmail.com">kevinqindev@gmail.com</a>><br>

</span><div class="HOEnZb"><div class="h5">> To: "Hal Finkel" <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>><br>

> Cc: "llvm-commits" <<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>><br>

> Sent: Wednesday, March 4, 2015 3:09:51 AM<br>

> Subject: Re: [AArch64] Enable partial unrolling on cortex-a57 and 2 related improvement<br>

><br>

><br>

> Hi Hal,<br>

><br>

><br>

> Thanks a lot for your review. I totally agree with your review<br>

> comments, and here are the updated patches.<br>

><br>

><br>

> Regards,<br>

> Kevin<br>

><br>

><br>

> 2015-03-04 11:29 GMT+08:00 Hal Finkel < <a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a> > :<br>

><br>

><br>

> ----- Original Message -----<br>

> > From: "Kevin Qin" < <a href="mailto:kevinqindev@gmail.com">kevinqindev@gmail.com</a> ><br>

> > To: "llvm-commits" < <a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a> ><br>

> > Sent: Friday, February 27, 2015 11:55:05 PM<br>

> > Subject: [AArch64] Enable partial unrolling on cortex-a57 and 2<br>

> > related improvement<br>

> ><br>

> ><br>

> ><br>

> > Hi,<br>

> ><br>

> ><br>

> > Previously, I made commit r219401 that try to enable partial &<br>

> > runtime unrolling on cortex-a57, but I forgot to call base TTI<br>

> > implementation in target specific hook, so those unrolling methods<br>

> > are not really enabled.<br>

> ><br>

> ><br>

> > Here are the patch to get them enabled and 2 related patches to<br>

> > improve it.<br>

> ><br>

> ><br>

> > 0001 - Run LICM pass after loop unrolling pass. Runtime unrollng<br>

> > will<br>

> > introduce a runtime check in loop prologue(you can treat it as a<br>

> > loop preheader). If the unrolled loop is a inner loop, then the<br>

> > proglogue will be inside the outer loop. LICM pass can help to<br>

> > promote the runtime check out if the checked value is loop<br>

> > invariant.<br>

><br>

> I think makes sense, at least for LICM, and is consistent with what<br>

> James observed from the early run of the unroller. Please add a<br>

> comment explaining why those passes are there. This file does not<br>

> have many 'rationale' comments, and this is not a good thing. Why<br>

> are you adding CVP? Can you please add some test cases (we normally<br>

> don't add tests that runs the full pipeline, but for testing the<br>

> pipeline, it is a good idea).<br>

><br>

> ><br>

> ><br>

> > 0002 - Introduce runtime unrolling disable matadata and use it to<br>

> > mark the scalar loop from vectorization. Runtime unrolling is an<br>

> > expensive optimization which can bring benefit only if the loop is<br>

> > hot and iteration number is relatively large enough. For some<br>

> > loops,<br>

> > we know they are not worth to be runtime unrolled. The scalar loop<br>

> > from vectorization is one of the cases.<br>

><br>

> I think this is a good idea. However, I think we might be overlooking<br>

> something. If the purpose of the scalar loop is only to handle the<br>

> 'left over' part of the iteration space that is not divisible by the<br>

> vector length. However, if there are runtime safety checks, and<br>

> those checks generally fail, then the loop could be hot. Can we<br>

> exclude the case where we've emitted safety checks?<br>

><br>

> ><br>

> ><br>

> > 0003 - Enable partial & runtime unrolling on cortex-a57, and double<br>

> > the unrolling threshold if the loop depth > 1. For inner one of<br>

> > nested loops, it is more likely to be a hot loop, and the runtime<br>

> > check can be promoted out from patch 0001, so the overhead is less,<br>

> > we can try a larger threshold to unroll more loops.<br>

> ><br>

><br>

><br>

> + if (L->getLoopDepth() > 1)<br>

> + UP.PartialThreshold *= 2;<br>

><br>

> Please add a comment here.<br>

><br>

> -Hal<br>

><br>

> ><br>

> ><br>

> ><br>

> > Combined above changes together, we can get below performance and<br>

> > code size changes.<br>

> ><br>

> ><br>

> > Benchmark Execution time code bloat<br>

> ><br>

> ><br>

> > spec.cpu2000.179_art -16.567% 8.805%<br>

> > spec.cpu2000.177_mesa -2.771% 1.912%<br>

> > spec.cpu2006.483_xalancbmk -2.555% 0.076%<br>

> > spec.cpu2000.256_bzip2 -1.648% 2.414%<br>

> > spec.cpu2006.433_milc -1.228% 1.353%<br>

> > spec.cpu2006.456_hmmer -1.079% 2.413%<br>

> ><br>

> > spec.cpu2006.462_libquantum 2.492% 1.482%<br>

> > spec.cpu2000.253_perlbmk 1.563% 0.464%<br>

> > spec.cpu2006.450_soplex 1.379% 1.925%<br>

> > spec.cpu2000.186_crafty 1.242% 0.005%<br>

> ><br>

> > spec.geomean -0.546% 0.952%<br>

> ><br>

> ><br>

> > Please review. Thanks.<br>

> ><br>

> ><br>

> > --<br>

> ><br>

> ><br>

> > Best Regards,<br>

> ><br>

> ><br>

> > Kevin Qin<br>

> > _______________________________________________<br>

> > llvm-commits mailing list<br>

> > <a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

> > <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

> ><br>

><br>

> --<br>

> Hal Finkel<br>

> Assistant Computational Scientist<br>

> Leadership Computing Facility<br>

> Argonne National Laboratory<br>

><br>

><br>

><br>

><br>

> --<br>

><br>

><br>

> Best Regards,<br>

><br>

><br>

> Kevin Qin<br>

<br>

--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">Best Regards,<div><br></div><div>Kevin Qin</div></div></div>

</div>