[AArch64] Enable partial unrolling on cortex-a57 and 2 related improvement

Wed Mar 4 22:48:09 PST 2015

Hi Hal,

Thanks for your quick review again. I refactored the patch as your
suggestion. It's a choice between duplicating the method to get whether or
not to insert runtime checks, nor duplicating the data of recording that.
And I agreed that the later one is better as it's easy to maintain. Here's
the updated patch.

Cheers,
Kevin

2015-03-05 13:56 GMT+08:00 Hal Finkel <hfinkel at anl.gov>:

> ----- Original Message -----
> > From: "Kevin Qin" <kevinqindev at gmail.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>
> > Sent: Wednesday, March 4, 2015 8:12:03 PM
> > Subject: Re: [AArch64] Enable partial unrolling on cortex-a57 and 2
> related improvement
> >
> >
> > Hi Hal,
> >
> >
> > For the first patch, I've added comments to explain why we run LICM
> > pass after loop unrolling pass, and added a test to check if the
> > runtime unrolling prologue is promoted out by LICM at -O2. Can you
> > point me which part is not sufficient? For adding a run of
> > CorrelatedValuePropagation, it's because I found that LICM had
> > dependence on it. If I run LICM only after loop unrolling, llvm will
> > crash with:
> >
> >
> > llvm/lib/Transforms/Scalar/LICM.cpp:196: virtual bool <anonymous
> > namespace>::LICM::runOnLoop(llvm::Loop *, llvm::LPPassManager &):
> > Assertion `InnerAST && "Where is my AST?"' failed.
> >
> >
> >
> > For the second patch becoming huge, it's caused by moving class. The
> > problem is like,
> >
> >
> >
> > class B;
> >
> >
> > class A {
> > void doSomething(B * _b) { //This function is newly added by this
> > patch.
> > _b->add();
> > }
> > };
> >
> >
> > class B {
> >
> >
> > void add() {
> > ...
> > }
> >
> >
> > };
> >
> >
> > Above code can't be compiled with error: member access into
> > incomplete type 'B' . So I moved class B in front of A.
> >
> >
> > The meaningful changes comparing to last edition are,
> >
> >
> >
> > class InnerLoopVectorizer {
> >
> >
> > public:
> >
> > + // Whether runtime check about strides is added.
> > + bool IsCheckStrides() {
> > + return Legal->mustCheckStrides();
> > + }
> >
> > + // Whether runtime check about memory is added.
> > + bool IsCheckMemory() {
> > + return Legal->getLAI()->getRuntimePointerCheck()->Need;
> > + }
> > }
> >
> >
> >
> > + // Add metadata to disable runtime unrolling scalar loop when
> > there's no
> > + // runtime check about strides and memory. Because at this
> > situation,
> > + // scalar loop is rarely used and not worthy to be unrolled.
> > + if (!LB.IsCheckStrides() && !LB.IsCheckMemory())
> > + AddRuntimeUnrollDisableMetaData(L);
> >
> >
> > I'm sorry for generating such a huge patch and bring difficulty for
> > code review. Above information can help you to understand it a bit
> > easier.
>
> Okay, I think that it would be better not to repeat the logic for whether
> or not to insert runtime checks; it would be very each for these to get out
> of sync in the future. I think that it would be better to add some boolean
> to the InnerLoopVectorizer, and just record weather or not a check was
> added in InnerLoopVectorizer::createEmptyLoop so that we can query it later.
>
>  -Hal
>
> >
> >
> > Thanks,
> > Kevin
> >
> >
> > 2015-03-05 0:43 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
> >
> >
> > Hi Kevin,
> >
> > Regarding the first patch, you did not comment on (nor add a test
> > for?) also adding a run of CorrelatedValuePropagation. Can you
> > please explain the rationale?
> >
> > The vectorizer/unrolling patch is now huge. What happened?
> >
> > Regarding the third, this LGTM.
> >
> > -Hal
> >
> > ----- Original Message -----
> > > From: "Kevin Qin" < kevinqindev at gmail.com >
> >
> >
> > > To: "Hal Finkel" < hfinkel at anl.gov >
> > > Cc: "llvm-commits" < llvm-commits at cs.uiuc.edu >
> > > Sent: Wednesday, March 4, 2015 3:09:51 AM
> > > Subject: Re: [AArch64] Enable partial unrolling on cortex-a57 and 2
> > > related improvement
> > >
> > >
> > > Hi Hal,
> > >
> > >
> > > Thanks a lot for your review. I totally agree with your review
> > > comments, and here are the updated patches.
> > >
> > >
> > > Regards,
> > > Kevin
> > >
> > >
> > > 2015-03-04 11:29 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
> > >
> > >
> > > ----- Original Message -----
> > > > From: "Kevin Qin" < kevinqindev at gmail.com >
> > > > To: "llvm-commits" < llvm-commits at cs.uiuc.edu >
> > > > Sent: Friday, February 27, 2015 11:55:05 PM
> > > > Subject: [AArch64] Enable partial unrolling on cortex-a57 and 2
> > > > related improvement
> > > >
> > > >
> > > >
> > > > Hi,
> > > >
> > > >
> > > > Previously, I made commit r219401 that try to enable partial &
> > > > runtime unrolling on cortex-a57, but I forgot to call base TTI
> > > > implementation in target specific hook, so those unrolling
> > > > methods
> > > > are not really enabled.
> > > >
> > > >
> > > > Here are the patch to get them enabled and 2 related patches to
> > > > improve it.
> > > >
> > > >
> > > > 0001 - Run LICM pass after loop unrolling pass. Runtime unrollng
> > > > will
> > > > introduce a runtime check in loop prologue(you can treat it as a
> > > > loop preheader). If the unrolled loop is a inner loop, then the
> > > > proglogue will be inside the outer loop. LICM pass can help to
> > > > promote the runtime check out if the checked value is loop
> > > > invariant.
> > >
> > > I think makes sense, at least for LICM, and is consistent with what
> > > James observed from the early run of the unroller. Please add a
> > > comment explaining why those passes are there. This file does not
> > > have many 'rationale' comments, and this is not a good thing. Why
> > > are you adding CVP? Can you please add some test cases (we normally
> > > don't add tests that runs the full pipeline, but for testing the
> > > pipeline, it is a good idea).
> > >
> > > >
> > > >
> > > > 0002 - Introduce runtime unrolling disable matadata and use it to
> > > > mark the scalar loop from vectorization. Runtime unrolling is an
> > > > expensive optimization which can bring benefit only if the loop
> > > > is
> > > > hot and iteration number is relatively large enough. For some
> > > > loops,
> > > > we know they are not worth to be runtime unrolled. The scalar
> > > > loop
> > > > from vectorization is one of the cases.
> > >
> > > I think this is a good idea. However, I think we might be
> > > overlooking
> > > something. If the purpose of the scalar loop is only to handle the
> > > 'left over' part of the iteration space that is not divisible by
> > > the
> > > vector length. However, if there are runtime safety checks, and
> > > those checks generally fail, then the loop could be hot. Can we
> > > exclude the case where we've emitted safety checks?
> > >
> > > >
> > > >
> > > > 0003 - Enable partial & runtime unrolling on cortex-a57, and
> > > > double
> > > > the unrolling threshold if the loop depth > 1. For inner one of
> > > > nested loops, it is more likely to be a hot loop, and the runtime
> > > > check can be promoted out from patch 0001, so the overhead is
> > > > less,
> > > > we can try a larger threshold to unroll more loops.
> > > >
> > >
> > >
> > > + if (L->getLoopDepth() > 1)
> > > + UP.PartialThreshold *= 2;
> > >
> > > Please add a comment here.
> > >
> > > -Hal
> > >
> > > >
> > > >
> > > >
> > > > Combined above changes together, we can get below performance and
> > > > code size changes.
> > > >
> > > >
> > > > Benchmark Execution time code bloat
> > > >
> > > >
> > > > spec.cpu2000.179_art -16.567% 8.805%
> > > > spec.cpu2000.177_mesa -2.771% 1.912%
> > > > spec.cpu2006.483_xalancbmk -2.555% 0.076%
> > > > spec.cpu2000.256_bzip2 -1.648% 2.414%
> > > > spec.cpu2006.433_milc -1.228% 1.353%
> > > > spec.cpu2006.456_hmmer -1.079% 2.413%
> > > >
> > > > spec.cpu2006.462_libquantum 2.492% 1.482%
> > > > spec.cpu2000.253_perlbmk 1.563% 0.464%
> > > > spec.cpu2006.450_soplex 1.379% 1.925%
> > > > spec.cpu2000.186_crafty 1.242% 0.005%
> > > >
> > > > spec.geomean -0.546% 0.952%
> > > >
> > > >
> > > > Please review. Thanks.
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > Best Regards,
> > > >
> > > >
> > > > Kevin Qin
> > > > _______________________________________________
> > > > llvm-commits mailing list
> > > > llvm-commits at cs.uiuc.edu
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > >
> > >
> > > --
> > > Hal Finkel
> > > Assistant Computational Scientist
> > > Leadership Computing Facility
> > > Argonne National Laboratory
> > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > Best Regards,
> > >
> > >
> > > Kevin Qin
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> >
> >
> >
> >
> > --
> >
> >
> > Best Regards,
> >
> >
> > Kevin Qin
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>

-- 
Best Regards,

Kevin Qin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150305/a1f498b9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Introduce-runtime-unrolling-disable-matadata-and-use.patch
Type: text/x-patch
Size: 9636 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150305/a1f498b9/attachment.bin>