[AArch64] Enable partial unrolling on cortex-a57 and 2 related improvement
Hal Finkel
hfinkel at anl.gov
Thu Mar 5 22:54:03 PST 2015
----- Original Message -----
> From: "Kevin Qin" <kevinqindev at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>
> Sent: Thursday, March 5, 2015 11:44:03 PM
> Subject: Re: [AArch64] Enable partial unrolling on cortex-a57 and 2
> related improvement
> Hi Hal,
> Here's the updated one. Please review again.
Okay, LGTM. You need to add something to the LangRef about the metadata too.
Thanks,
Hal
> Thanks,
> Kevin
> 2015-03-06 0:13 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
> > ----- Original Message -----
>
> > > From: "Kevin Qin" < kevinqindev at gmail.com >
>
> > > To: "Hal Finkel" < hfinkel at anl.gov >
>
> > > Cc: "llvm-commits" < llvm-commits at cs.uiuc.edu >
>
> > > Sent: Thursday, March 5, 2015 12:48:09 AM
>
> > > Subject: Re: [AArch64] Enable partial unrolling on cortex-a57 and
> > > 2
> > > related improvement
>
> > >
>
> > >
>
> > > Hi Hal,
>
> > >
>
> > >
>
> > > Thanks for your quick review again. I refactored the patch as
> > > your
>
> > > suggestion. It's a choice between duplicating the method to get
>
> > > whether or not to insert runtime checks, nor duplicating the data
> > > of
>
> > > recording that. And I agreed that the later one is better as it's
>
> > > easy to maintain. Here's the updated patch.
>
> > Okay, but let's make this even easier to maintain: use only one
> > boolean. This way, if we add a new type of safety check, we need a
> > new boolean, and we need to add a new part to the if condition
> > controlling the metadata. Just use one variable: AddedSafetyChecks,
> > set it to true if we add one of any type, and then check that
> > afterward.
>
> > Thanks again,
>
> > Hal
>
> > >
>
> > >
>
> > > Cheers,
>
> > > Kevin
>
> > >
>
> > >
>
> > > 2015-03-05 13:56 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
>
> > >
>
> > >
>
> > > ----- Original Message -----
>
> > > > From: "Kevin Qin" < kevinqindev at gmail.com >
>
> > > > To: "Hal Finkel" < hfinkel at anl.gov >
>
> > > > Cc: "llvm-commits" < llvm-commits at cs.uiuc.edu >
>
> > > > Sent: Wednesday, March 4, 2015 8:12:03 PM
>
> > > > Subject: Re: [AArch64] Enable partial unrolling on cortex-a57
> > > > and
> > > > 2
>
> > > > related improvement
>
> > > >
>
> > > >
>
> > > > Hi Hal,
>
> > > >
>
> > > >
>
> > >
>
> > >
>
> > > > For the first patch, I've added comments to explain why we run
> > > > LICM
>
> > > > pass after loop unrolling pass, and added a test to check if
> > > > the
>
> > > > runtime unrolling prologue is promoted out by LICM at -O2. Can
> > > > you
>
> > > > point me which part is not sufficient? For adding a run of
>
> > > > CorrelatedValuePropagation, it's because I found that LICM had
>
> > > > dependence on it. If I run LICM only after loop unrolling, llvm
>
> > > > will
>
> > > > crash with:
>
> > > >
>
> > > >
>
> > > > llvm/lib/Transforms/Scalar/LICM.cpp:196: virtual bool
> > > > <anonymous
>
> > > > namespace>::LICM::runOnLoop(llvm::Loop *, llvm::LPPassManager
> > > > &):
>
> > > > Assertion `InnerAST && "Where is my AST?"' failed.
>
> > > >
>
> > > >
>
> > > >
>
> > > > For the second patch becoming huge, it's caused by moving
> > > > class.
>
> > > > The
>
> > > > problem is like,
>
> > > >
>
> > > >
>
> > > >
>
> > > > class B;
>
> > > >
>
> > > >
>
> > > > class A {
>
> > > > void doSomething(B * _b) { //This function is newly added by
> > > > this
>
> > > > patch.
>
> > > > _b->add();
>
> > > > }
>
> > > > };
>
> > > >
>
> > > >
>
> > > > class B {
>
> > > >
>
> > > >
>
> > > > void add() {
>
> > > > ...
>
> > > > }
>
> > > >
>
> > > >
>
> > > > };
>
> > > >
>
> > > >
>
> > > > Above code can't be compiled with error: member access into
>
> > > > incomplete type 'B' . So I moved class B in front of A.
>
> > > >
>
> > > >
>
> > > > The meaningful changes comparing to last edition are,
>
> > > >
>
> > > >
>
> > > >
>
> > > > class InnerLoopVectorizer {
>
> > > >
>
> > > >
>
> > > > public:
>
> > > >
>
> > > > + // Whether runtime check about strides is added.
>
> > > > + bool IsCheckStrides() {
>
> > > > + return Legal->mustCheckStrides();
>
> > > > + }
>
> > > >
>
> > > > + // Whether runtime check about memory is added.
>
> > > > + bool IsCheckMemory() {
>
> > > > + return Legal->getLAI()->getRuntimePointerCheck()->Need;
>
> > > > + }
>
> > > > }
>
> > > >
>
> > > >
>
> > > >
>
> > > > + // Add metadata to disable runtime unrolling scalar loop when
>
> > > > there's no
>
> > > > + // runtime check about strides and memory. Because at this
>
> > > > situation,
>
> > > > + // scalar loop is rarely used and not worthy to be unrolled.
>
> > > > + if (!LB.IsCheckStrides() && !LB.IsCheckMemory())
>
> > > > + AddRuntimeUnrollDisableMetaData(L);
>
> > > >
>
> > > >
>
> > > > I'm sorry for generating such a huge patch and bring difficulty
> > > > for
>
> > > > code review. Above information can help you to understand it a
> > > > bit
>
> > > > easier.
>
> > >
>
> > > Okay, I think that it would be better not to repeat the logic for
>
> > > whether or not to insert runtime checks; it would be very each
> > > for
>
> > > these to get out of sync in the future. I think that it would be
>
> > > better to add some boolean to the InnerLoopVectorizer, and just
>
> > > record weather or not a check was added in
>
> > > InnerLoopVectorizer::createEmptyLoop so that we can query it
> > > later.
>
> > >
>
> > > -Hal
>
> > >
>
> > >
>
> > >
>
> > > >
>
> > > >
>
> > > > Thanks,
>
> > > > Kevin
>
> > > >
>
> > > >
>
> > > > 2015-03-05 0:43 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
>
> > > >
>
> > > >
>
> > > > Hi Kevin,
>
> > > >
>
> > > > Regarding the first patch, you did not comment on (nor add a
> > > > test
>
> > > > for?) also adding a run of CorrelatedValuePropagation. Can you
>
> > > > please explain the rationale?
>
> > > >
>
> > > > The vectorizer/unrolling patch is now huge. What happened?
>
> > > >
>
> > > > Regarding the third, this LGTM.
>
> > > >
>
> > > > -Hal
>
> > > >
>
> > > > ----- Original Message -----
>
> > > > > From: "Kevin Qin" < kevinqindev at gmail.com >
>
> > > >
>
> > > >
>
> > > > > To: "Hal Finkel" < hfinkel at anl.gov >
>
> > > > > Cc: "llvm-commits" < llvm-commits at cs.uiuc.edu >
>
> > > > > Sent: Wednesday, March 4, 2015 3:09:51 AM
>
> > > > > Subject: Re: [AArch64] Enable partial unrolling on cortex-a57
> > > > > and
>
> > > > > 2
>
> > > > > related improvement
>
> > > > >
>
> > > > >
>
> > > > > Hi Hal,
>
> > > > >
>
> > > > >
>
> > > > > Thanks a lot for your review. I totally agree with your
> > > > > review
>
> > > > > comments, and here are the updated patches.
>
> > > > >
>
> > > > >
>
> > > > > Regards,
>
> > > > > Kevin
>
> > > > >
>
> > > > >
>
> > > > > 2015-03-04 11:29 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
>
> > > > >
>
> > > > >
>
> > > > > ----- Original Message -----
>
> > > > > > From: "Kevin Qin" < kevinqindev at gmail.com >
>
> > > > > > To: "llvm-commits" < llvm-commits at cs.uiuc.edu >
>
> > > > > > Sent: Friday, February 27, 2015 11:55:05 PM
>
> > > > > > Subject: [AArch64] Enable partial unrolling on cortex-a57
> > > > > > and
> > > > > > 2
>
> > > > > > related improvement
>
> > > > > >
>
> > > > > >
>
> > > > > >
>
> > > > > > Hi,
>
> > > > > >
>
> > > > > >
>
> > > > > > Previously, I made commit r219401 that try to enable
> > > > > > partial
> > > > > > &
>
> > > > > > runtime unrolling on cortex-a57, but I forgot to call base
> > > > > > TTI
>
> > > > > > implementation in target specific hook, so those unrolling
>
> > > > > > methods
>
> > > > > > are not really enabled.
>
> > > > > >
>
> > > > > >
>
> > > > > > Here are the patch to get them enabled and 2 related
> > > > > > patches
> > > > > > to
>
> > > > > > improve it.
>
> > > > > >
>
> > > > > >
>
> > > > > > 0001 - Run LICM pass after loop unrolling pass. Runtime
>
> > > > > > unrollng
>
> > > > > > will
>
> > > > > > introduce a runtime check in loop prologue(you can treat it
> > > > > > as
>
> > > > > > a
>
> > > > > > loop preheader). If the unrolled loop is a inner loop, then
> > > > > > the
>
> > > > > > proglogue will be inside the outer loop. LICM pass can help
> > > > > > to
>
> > > > > > promote the runtime check out if the checked value is loop
>
> > > > > > invariant.
>
> > > > >
>
> > > > > I think makes sense, at least for LICM, and is consistent
> > > > > with
>
> > > > > what
>
> > > > > James observed from the early run of the unroller. Please add
> > > > > a
>
> > > > > comment explaining why those passes are there. This file does
> > > > > not
>
> > > > > have many 'rationale' comments, and this is not a good thing.
> > > > > Why
>
> > > > > are you adding CVP? Can you please add some test cases (we
>
> > > > > normally
>
> > > > > don't add tests that runs the full pipeline, but for testing
> > > > > the
>
> > > > > pipeline, it is a good idea).
>
> > > > >
>
> > > > > >
>
> > > > > >
>
> > > > > > 0002 - Introduce runtime unrolling disable matadata and use
> > > > > > it
>
> > > > > > to
>
> > > > > > mark the scalar loop from vectorization. Runtime unrolling
> > > > > > is
>
> > > > > > an
>
> > > > > > expensive optimization which can bring benefit only if the
> > > > > > loop
>
> > > > > > is
>
> > > > > > hot and iteration number is relatively large enough. For
> > > > > > some
>
> > > > > > loops,
>
> > > > > > we know they are not worth to be runtime unrolled. The
> > > > > > scalar
>
> > > > > > loop
>
> > > > > > from vectorization is one of the cases.
>
> > > > >
>
> > > > > I think this is a good idea. However, I think we might be
>
> > > > > overlooking
>
> > > > > something. If the purpose of the scalar loop is only to
> > > > > handle
>
> > > > > the
>
> > > > > 'left over' part of the iteration space that is not divisible
> > > > > by
>
> > > > > the
>
> > > > > vector length. However, if there are runtime safety checks,
> > > > > and
>
> > > > > those checks generally fail, then the loop could be hot. Can
> > > > > we
>
> > > > > exclude the case where we've emitted safety checks?
>
> > > > >
>
> > > > > >
>
> > > > > >
>
> > > > > > 0003 - Enable partial & runtime unrolling on cortex-a57,
> > > > > > and
>
> > > > > > double
>
> > > > > > the unrolling threshold if the loop depth > 1. For inner
> > > > > > one
> > > > > > of
>
> > > > > > nested loops, it is more likely to be a hot loop, and the
>
> > > > > > runtime
>
> > > > > > check can be promoted out from patch 0001, so the overhead
> > > > > > is
>
> > > > > > less,
>
> > > > > > we can try a larger threshold to unroll more loops.
>
> > > > > >
>
> > > > >
>
> > > > >
>
> > > > > + if (L->getLoopDepth() > 1)
>
> > > > > + UP.PartialThreshold *= 2;
>
> > > > >
>
> > > > > Please add a comment here.
>
> > > > >
>
> > > > > -Hal
>
> > > > >
>
> > > > > >
>
> > > > > >
>
> > > > > >
>
> > > > > > Combined above changes together, we can get below
> > > > > > performance
>
> > > > > > and
>
> > > > > > code size changes.
>
> > > > > >
>
> > > > > >
>
> > > > > > Benchmark Execution time code bloat
>
> > > > > >
>
> > > > > >
>
> > > > > > spec.cpu2000.179_art -16.567% 8.805%
>
> > > > > > spec.cpu2000.177_mesa -2.771% 1.912%
>
> > > > > > spec.cpu2006.483_xalancbmk -2.555% 0.076%
>
> > > > > > spec.cpu2000.256_bzip2 -1.648% 2.414%
>
> > > > > > spec.cpu2006.433_milc -1.228% 1.353%
>
> > > > > > spec.cpu2006.456_hmmer -1.079% 2.413%
>
> > > > > >
>
> > > > > > spec.cpu2006.462_libquantum 2.492% 1.482%
>
> > > > > > spec.cpu2000.253_perlbmk 1.563% 0.464%
>
> > > > > > spec.cpu2006.450_soplex 1.379% 1.925%
>
> > > > > > spec.cpu2000.186_crafty 1.242% 0.005%
>
> > > > > >
>
> > > > > > spec.geomean -0.546% 0.952%
>
> > > > > >
>
> > > > > >
>
> > > > > > Please review. Thanks.
>
> > > > > >
>
> > > > > >
>
> > > > > > --
>
> > > > > >
>
> > > > > >
>
> > > > > > Best Regards,
>
> > > > > >
>
> > > > > >
>
> > > > > > Kevin Qin
>
> > > > > > _______________________________________________
>
> > > > > > llvm-commits mailing list
>
> > > > > > llvm-commits at cs.uiuc.edu
>
> > > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> > > > > >
>
> > > > >
>
> > > > > --
>
> > > > > Hal Finkel
>
> > > > > Assistant Computational Scientist
>
> > > > > Leadership Computing Facility
>
> > > > > Argonne National Laboratory
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > --
>
> > > > >
>
> > > > >
>
> > > > > Best Regards,
>
> > > > >
>
> > > > >
>
> > > > > Kevin Qin
>
> > > >
>
> > > > --
>
> > > > Hal Finkel
>
> > > > Assistant Computational Scientist
>
> > > > Leadership Computing Facility
>
> > > > Argonne National Laboratory
>
> > > >
>
> > > >
>
> > > >
>
> > > >
>
> > > > --
>
> > > >
>
> > > >
>
> > > > Best Regards,
>
> > > >
>
> > > >
>
> > > > Kevin Qin
>
> > >
>
> > > --
>
> > > Hal Finkel
>
> > > Assistant Computational Scientist
>
> > > Leadership Computing Facility
>
> > > Argonne National Laboratory
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > --
>
> > >
>
> > >
>
> > > Best Regards,
>
> > >
>
> > >
>
> > > Kevin Qin
>
> > --
>
> > Hal Finkel
>
> > Assistant Computational Scientist
>
> > Leadership Computing Facility
>
> > Argonne National Laboratory
>
> --
> Best Regards,
> Kevin Qin
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150306/7208cc84/attachment.html>
More information about the llvm-commits
mailing list