<div dir="ltr">Committed as r231630, r231631, r231632.<div><br></div><div>Thanks,</div><div>Kevin</div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-03-06 14:54 GMT+08:00 Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000"><br><hr><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><span class=""><b>From: </b>"Kevin Qin" <<a href="mailto:kevinqindev@gmail.com" target="_blank">kevinqindev@gmail.com</a>><br><b>To: </b>"Hal Finkel" <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>><br><b>Cc: </b>"llvm-commits" <<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>><br></span><b>Sent: </b>Thursday, March 5, 2015 11:44:03 PM<span class=""><br><b>Subject: </b>Re: [AArch64] Enable partial unrolling on cortex-a57 and 2 related improvement<br><br></span><div dir="ltr">Hi Hal,<div><br></div><span class=""><div>Here's the updated one. Please review again.</div></span></div></blockquote><br>Okay, LGTM. You need to add something to the LangRef about the metadata too.<br><br>Thanks,<br>Hal<br><br><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><div dir="ltr"><div></div><div><br></div><div>Thanks,</div><div>Kevin</div></div><div class="gmail_extra"><br><div class="gmail_quote"><span class="">2015-03-06 0:13 GMT+08:00 Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>:<br></span><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span><hr><div><div class="h5"><br>

> From: "Kevin Qin" <<a href="mailto:kevinqindev@gmail.com" target="_blank">kevinqindev@gmail.com</a>><br>

> To: "Hal Finkel" <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>><br>

> Cc: "llvm-commits" <<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>><br>

</div></div></span><div><div class="h5"><span>> Sent: Thursday, March 5, 2015 12:48:09 AM<br>

> Subject: Re: [AArch64] Enable partial unrolling on cortex-a57 and 2 related improvement<br>

><br>

><br>

> Hi Hal,<br>

><br>

><br>

</span><span>> Thanks for your quick review again. I refactored the patch as your<br>

> suggestion. It's a choice between duplicating the method to get<br>

> whether or not to insert runtime checks, nor duplicating the data of<br>

> recording that. And I agreed that the later one is better as it's<br>

> easy to maintain. Here's the updated patch.<br>

<br>

</span><span>Okay, but let's make this even easier to maintain: use only one boolean. This way, if we add a new type of safety check, we need a new boolean, and we need to add a new part to the if condition controlling the metadata. Just use one variable: AddedSafetyChecks, set it to true if we add one of any type, and then check that afterward.<br>

<br>

Thanks again,<br>

Hal<br>

<br>

><br>

><br>

</span><div><div>> Cheers,<br>

> Kevin<br>

><br>

><br>

> 2015-03-05 13:56 GMT+08:00 Hal Finkel < <a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a> > :<br>

><br>

><br>

> ----- Original Message -----<br>

> > From: "Kevin Qin" < <a href="mailto:kevinqindev@gmail.com" target="_blank">kevinqindev@gmail.com</a> ><br>

> > To: "Hal Finkel" < <a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a> ><br>

> > Cc: "llvm-commits" < <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a> ><br>

> > Sent: Wednesday, March 4, 2015 8:12:03 PM<br>

> > Subject: Re: [AArch64] Enable partial unrolling on cortex-a57 and 2<br>

> > related improvement<br>

> ><br>

> ><br>

> > Hi Hal,<br>

> ><br>

> ><br>

><br>

><br>

> > For the first patch, I've added comments to explain why we run LICM<br>

> > pass after loop unrolling pass, and added a test to check if the<br>

> > runtime unrolling prologue is promoted out by LICM at -O2. Can you<br>

> > point me which part is not sufficient? For adding a run of<br>

> > CorrelatedValuePropagation, it's because I found that LICM had<br>

> > dependence on it. If I run LICM only after loop unrolling, llvm<br>

> > will<br>

> > crash with:<br>

> ><br>

> ><br>

> > llvm/lib/Transforms/Scalar/LICM.cpp:196: virtual bool <anonymous<br>

> > namespace>::LICM::runOnLoop(llvm::Loop *, llvm::LPPassManager &):<br>

> > Assertion `InnerAST && "Where is my AST?"' failed.<br>

> ><br>

> ><br>

> ><br>

> > For the second patch becoming huge, it's caused by moving class.<br>

> > The<br>

> > problem is like,<br>

> ><br>

> ><br>

> ><br>

> > class B;<br>

> ><br>

> ><br>

> > class A {<br>

> > void doSomething(B * _b) { //This function is newly added by this<br>

> > patch.<br>

> > _b->add();<br>

> > }<br>

> > };<br>

> ><br>

> ><br>

> > class B {<br>

> ><br>

> ><br>

> > void add() {<br>

> > ...<br>

> > }<br>

> ><br>

> ><br>

> > };<br>

> ><br>

> ><br>

> > Above code can't be compiled with error: member access into<br>

> > incomplete type 'B' . So I moved class B in front of A.<br>

> ><br>

> ><br>

> > The meaningful changes comparing to last edition are,<br>

> ><br>

> ><br>

> ><br>

> > class InnerLoopVectorizer {<br>

> ><br>

> ><br>

> > public:<br>

> ><br>

> > + // Whether runtime check about strides is added.<br>

> > + bool IsCheckStrides() {<br>

> > + return Legal->mustCheckStrides();<br>

> > + }<br>

> ><br>

> > + // Whether runtime check about memory is added.<br>

> > + bool IsCheckMemory() {<br>

> > + return Legal->getLAI()->getRuntimePointerCheck()->Need;<br>

> > + }<br>

> > }<br>

> ><br>

> ><br>

> ><br>

> > + // Add metadata to disable runtime unrolling scalar loop when<br>

> > there's no<br>

> > + // runtime check about strides and memory. Because at this<br>

> > situation,<br>

> > + // scalar loop is rarely used and not worthy to be unrolled.<br>

> > + if (!LB.IsCheckStrides() && !LB.IsCheckMemory())<br>

> > + AddRuntimeUnrollDisableMetaData(L);<br>

> ><br>

> ><br>

> > I'm sorry for generating such a huge patch and bring difficulty for<br>

> > code review. Above information can help you to understand it a bit<br>

> > easier.<br>

><br>

> Okay, I think that it would be better not to repeat the logic for<br>

> whether or not to insert runtime checks; it would be very each for<br>

> these to get out of sync in the future. I think that it would be<br>

> better to add some boolean to the InnerLoopVectorizer, and just<br>

> record weather or not a check was added in<br>

> InnerLoopVectorizer::createEmptyLoop so that we can query it later.<br>

><br>

> -Hal<br>

><br>

><br>

><br>

> ><br>

> ><br>

> > Thanks,<br>

> > Kevin<br>

> ><br>

> ><br>

> > 2015-03-05 0:43 GMT+08:00 Hal Finkel < <a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a> > :<br>

> ><br>

> ><br>

> > Hi Kevin,<br>

> ><br>

> > Regarding the first patch, you did not comment on (nor add a test<br>

> > for?) also adding a run of CorrelatedValuePropagation. Can you<br>

> > please explain the rationale?<br>

> ><br>

> > The vectorizer/unrolling patch is now huge. What happened?<br>

> ><br>

> > Regarding the third, this LGTM.<br>

> ><br>

> > -Hal<br>

> ><br>

> > ----- Original Message -----<br>

> > > From: "Kevin Qin" < <a href="mailto:kevinqindev@gmail.com" target="_blank">kevinqindev@gmail.com</a> ><br>

> ><br>

> ><br>

> > > To: "Hal Finkel" < <a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a> ><br>

> > > Cc: "llvm-commits" < <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a> ><br>

> > > Sent: Wednesday, March 4, 2015 3:09:51 AM<br>

> > > Subject: Re: [AArch64] Enable partial unrolling on cortex-a57 and<br>

> > > 2<br>

> > > related improvement<br>

> > ><br>

> > ><br>

> > > Hi Hal,<br>

> > ><br>

> > ><br>

> > > Thanks a lot for your review. I totally agree with your review<br>

> > > comments, and here are the updated patches.<br>

> > ><br>

> > ><br>

> > > Regards,<br>

> > > Kevin<br>

> > ><br>

> > ><br>

> > > 2015-03-04 11:29 GMT+08:00 Hal Finkel < <a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a> > :<br>

> > ><br>

> > ><br>

> > > ----- Original Message -----<br>

> > > > From: "Kevin Qin" < <a href="mailto:kevinqindev@gmail.com" target="_blank">kevinqindev@gmail.com</a> ><br>

> > > > To: "llvm-commits" < <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a> ><br>

> > > > Sent: Friday, February 27, 2015 11:55:05 PM<br>

> > > > Subject: [AArch64] Enable partial unrolling on cortex-a57 and 2<br>

> > > > related improvement<br>

> > > ><br>

> > > ><br>

> > > ><br>

> > > > Hi,<br>

> > > ><br>

> > > ><br>

> > > > Previously, I made commit r219401 that try to enable partial &<br>

> > > > runtime unrolling on cortex-a57, but I forgot to call base TTI<br>

> > > > implementation in target specific hook, so those unrolling<br>

> > > > methods<br>

> > > > are not really enabled.<br>

> > > ><br>

> > > ><br>

> > > > Here are the patch to get them enabled and 2 related patches to<br>

> > > > improve it.<br>

> > > ><br>

> > > ><br>

> > > > 0001 - Run LICM pass after loop unrolling pass. Runtime<br>

> > > > unrollng<br>

> > > > will<br>

> > > > introduce a runtime check in loop prologue(you can treat it as<br>

> > > > a<br>

> > > > loop preheader). If the unrolled loop is a inner loop, then the<br>

> > > > proglogue will be inside the outer loop. LICM pass can help to<br>

> > > > promote the runtime check out if the checked value is loop<br>

> > > > invariant.<br>

> > ><br>

> > > I think makes sense, at least for LICM, and is consistent with<br>

> > > what<br>

> > > James observed from the early run of the unroller. Please add a<br>

> > > comment explaining why those passes are there. This file does not<br>

> > > have many 'rationale' comments, and this is not a good thing. Why<br>

> > > are you adding CVP? Can you please add some test cases (we<br>

> > > normally<br>

> > > don't add tests that runs the full pipeline, but for testing the<br>

> > > pipeline, it is a good idea).<br>

> > ><br>

> > > ><br>

> > > ><br>

> > > > 0002 - Introduce runtime unrolling disable matadata and use it<br>

> > > > to<br>

> > > > mark the scalar loop from vectorization. Runtime unrolling is<br>

> > > > an<br>

> > > > expensive optimization which can bring benefit only if the loop<br>

> > > > is<br>

> > > > hot and iteration number is relatively large enough. For some<br>

> > > > loops,<br>

> > > > we know they are not worth to be runtime unrolled. The scalar<br>

> > > > loop<br>

> > > > from vectorization is one of the cases.<br>

> > ><br>

> > > I think this is a good idea. However, I think we might be<br>

> > > overlooking<br>

> > > something. If the purpose of the scalar loop is only to handle<br>

> > > the<br>

> > > 'left over' part of the iteration space that is not divisible by<br>

> > > the<br>

> > > vector length. However, if there are runtime safety checks, and<br>

> > > those checks generally fail, then the loop could be hot. Can we<br>

> > > exclude the case where we've emitted safety checks?<br>

> > ><br>

> > > ><br>

> > > ><br>

> > > > 0003 - Enable partial & runtime unrolling on cortex-a57, and<br>

> > > > double<br>

> > > > the unrolling threshold if the loop depth > 1. For inner one of<br>

> > > > nested loops, it is more likely to be a hot loop, and the<br>

> > > > runtime<br>

> > > > check can be promoted out from patch 0001, so the overhead is<br>

> > > > less,<br>

> > > > we can try a larger threshold to unroll more loops.<br>

> > > ><br>

> > ><br>

> > ><br>

> > > + if (L->getLoopDepth() > 1)<br>

> > > + UP.PartialThreshold *= 2;<br>

> > ><br>

> > > Please add a comment here.<br>

> > ><br>

> > > -Hal<br>

> > ><br>

> > > ><br>

> > > ><br>

> > > ><br>

> > > > Combined above changes together, we can get below performance<br>

> > > > and<br>

> > > > code size changes.<br>

> > > ><br>

> > > ><br>

> > > > Benchmark Execution time code bloat<br>

> > > ><br>

> > > ><br>

> > > > spec.cpu2000.179_art -16.567% 8.805%<br>

> > > > spec.cpu2000.177_mesa -2.771% 1.912%<br>

> > > > spec.cpu2006.483_xalancbmk -2.555% 0.076%<br>

> > > > spec.cpu2000.256_bzip2 -1.648% 2.414%<br>

> > > > spec.cpu2006.433_milc -1.228% 1.353%<br>

> > > > spec.cpu2006.456_hmmer -1.079% 2.413%<br>

> > > ><br>

> > > > spec.cpu2006.462_libquantum 2.492% 1.482%<br>

> > > > spec.cpu2000.253_perlbmk 1.563% 0.464%<br>

> > > > spec.cpu2006.450_soplex 1.379% 1.925%<br>

> > > > spec.cpu2000.186_crafty 1.242% 0.005%<br>

> > > ><br>

> > > > spec.geomean -0.546% 0.952%<br>

> > > ><br>

> > > ><br>

> > > > Please review. Thanks.<br>

> > > ><br>

> > > ><br>

> > > > --<br>

> > > ><br>

> > > ><br>

> > > > Best Regards,<br>

> > > ><br>

> > > ><br>

> > > > Kevin Qin<br>

> > > > _______________________________________________<br>

> > > > llvm-commits mailing list<br>

> > > > <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

> > > > <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

> > > ><br>

> > ><br>

> > > --<br>

> > > Hal Finkel<br>

> > > Assistant Computational Scientist<br>

> > > Leadership Computing Facility<br>

> > > Argonne National Laboratory<br>

> > ><br>

> > ><br>

> > ><br>

> > ><br>

> > > --<br>

> > ><br>

> > ><br>

> > > Best Regards,<br>

> > ><br>

> > ><br>

> > > Kevin Qin<br>

> ><br>

> > --<br>

> > Hal Finkel<br>

> > Assistant Computational Scientist<br>

> > Leadership Computing Facility<br>

> > Argonne National Laboratory<br>

> ><br>

> ><br>

> ><br>

> ><br>

> > --<br>

> ><br>

> ><br>

> > Best Regards,<br>

> ><br>

> ><br>

> > Kevin Qin<br>

><br>

> --<br>

> Hal Finkel<br>

> Assistant Computational Scientist<br>

> Leadership Computing Facility<br>

> Argonne National Laboratory<br>

><br>

><br>

><br>

><br>

> --<br>

><br>

><br>

> Best Regards,<br>

><br>

><br>

> Kevin Qin<br>

<br>

--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</div></div></div></div></blockquote></div><div><div class="h5"><br><br clear="all"><div><br></div>-- <br><div><div dir="ltr">Best Regards,<div><br></div><div>Kevin Qin</div></div></div>

</div></div></div>

</blockquote><div><div class="h5"><br><br><br>-- <br><div><span name="x"></span>Hal Finkel<br>Assistant Computational Scientist<br>Leadership Computing Facility<br>Argonne National Laboratory<span name="x"></span><br></div></div></div></div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">Best Regards,<div><br></div><div>Kevin Qin</div></div></div>

</div>