[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Hal Finkel hfinkel at anl.gov
Thu Jan 15 12:14:43 PST 2015


----- Original Message -----
> From: "Ana Pazos" <apazos at codeaurora.org>
> To: "Daniel Berlin" <dberlin at dberlin.org>
> Cc: "George Burgess IV" <george.burgess.iv at gmail.com>, "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>,
> "Jiangning Liu" <Jiangning.Liu at arm.com>
> Sent: Thursday, January 15, 2015 1:33:57 PM
> Subject: Re: [LLVMdev] question about enabling cfl-aa and collecting a57	numbers
> 
> 
> 
> 
> 
> Daniel, don’t we need to fix the order of invoking alias analyses in
> lib/Transforms/IPO/PassManagerBuilder.cpp as well?
> 
> 
> Your patch fixed the order in lib/CodeGen/Passes.cpp and the
> delegation code in lib/Analysis/CFLAliasAnalysis.cpp.

Yes, the order in lib/CodeGen/Passes.cpp and lib/Transforms/IPO/PassManagerBuilder.cpp should be kept consistent.

 -Hal

> 
> 
> 
> Thanks,
> 
> Ana.
> 
> 
> 
> From: Daniel Berlin [mailto:dberlin at dberlin.org]
> Sent: Wednesday, January 14, 2015 1:10 PM
> To: Ana Pazos
> Cc: George Burgess IV; Nick Lewycky; Jiangning Liu; LLVM Developers
> Mailing List
> Subject: Re: [LLVMdev] question about enabling cfl-aa and collecting
> a57 numbers
> 
> 
> 
> 
> Oh, sorry, i didn't rebase it when i changed the fix, you would have
> had to apply the first on top of the second.
> 
> 
> 
> Here is one against HEAD
> 
> 
> 
> 
> 
> On Wed, Jan 14, 2015 at 12:32 PM, Ana Pazos < apazos at codeaurora.org >
> wrote:
> 
> 
> 
> 
> 
> Daniel, your patch does not apply cleanly. Are you on the tip? The
> code I see there is no line if (QueryResult == MayAlias||
> QueryResult == PartialAlias) to be removed.
> 
> Thanks,
> 
> Ana.
> 
> 
> 
> 
> 
> From: George Burgess IV [mailto: george.burgess.iv at gmail.com ]
> Sent: Wednesday, January 14, 2015 10:31 AM
> To: Daniel Berlin
> Cc: Nick Lewycky; Ana Pazos; Jiangning Liu; LLVM Developers Mailing
> List
> Subject: Re: [LLVMdev] question about enabling cfl-aa and collecting
> a57 numbers
> 
> 
> 
> 
> 
> 
> 
> 
> Inline
> 
> - George
> 
> 
> 
> On Jan 14, 2015, at 10:49 AM, Daniel Berlin < dberlin at dberlin.org >
> wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Tue, Jan 13, 2015 at 11:26 PM, Nick Lewycky < nlewycky at google.com
> > wrote:
> 
> 
> 
> 
> 
> 
> On 13 January 2015 at 22:11, Daniel Berlin < dberlin at dberlin.org >
> wrote:
> 
> 
> 
> 
> This is caused by CFLAA returning PartialAlias for a query that
> BasicAA can prove is NoAlias.
> 
> 
> 
> 
> 
> One of them is wrong. Which one?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> CFL-AA.
> 
> 
> 
> 
> 
> Right now it checks whether two things come out to be the same
> stratified info and have the same index, and if so, returns
> PartialAlias. This is wrong, because it never knows why two things
> got unified, only that they did.
> 
> 
> 
> 
> 
> Therefore, the current check it does where it returns PartialAlias
> seems wrong.
> 
> 
> 
> 
> 
> George, why does it return PartialAlias?
> 
> It shouldn't -- unless I misinterpreted an earlier discussion, I was
> under the impression that we were testing with that line updated to
> return MayAlias. We do need to update that in LLVM though; thanks
> for the fix.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> IMHO, you can't ever prove any kind of must-alias or partialalias
> info with CFL-AA, unless i'm missing something.
> 
> Agreed.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I'm not sure from your description that this is a chaining issue.
> PartialAlias doesn't chain and isn't supposed to, it's a final
> answer just like NoAlias and MustAlias are. You return partial alias
> when you have proven that the accesses are overlapping, meaning that
> it is proven to neither be no alias nor must alias.
> 
> 
> Well, it's still chaining wrong, because it did not change on
> MayAlias, but you are right that it is not the issue i identified :)
> 
> 
> 
> 
> 
> Anyway, updated patch attached.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Nick
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I've attached a patch that fixes it, but truthfully, chaining seems
> somewhat badly designed IMHO.
> 
> 
> 
> 
> 
> It should let you pass along the result you've gotten so far, so that
> you don't have CFLAA get PartialAlias, chain, and have BasicAA get
> MayAlias.
> 
> 
> 
> 
> 
> (IE it should let it work strictly down the hierarchy).
> 
> 
> 
> 
> 
> Note that this patch breaks the CFLAliasAnalysis tests because it no
> longer returns PartialAlias in some cases it is expecting it.
> 
> 
> 
> 
> 
> I don't have time ATM to fix chaining completely, so if someone wants
> to shepherd this patch through, that would be appreciated.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Tue, Jan 13, 2015 at 8:58 PM, Daniel Berlin < dberlin at dberlin.org
> > wrote:
> 
> 
> 
> Can you send me actual LLVM IR or a preprocessed source from using
> -E?
> I don't have a machine handy that has headers that target that arch.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin < dberlin at dberlin.org
> > wrote:
> 
> 
> 
> Anything other than noalias or mustalias should be getting passed
> down the stack, so either that is not happening or CFL aa is giving
> better answers and something does worse with those better answers.
> I'll take a look this evening
> 
> 
> On Jan 13, 2015 3:58 PM, "Ana Pazos" < apazos at codeaurora.org > wrote:
> 
> 
> 
> Hi folks,
> 
> Moving the discussion to llvm.dev.
> 
> None of the changes we talked earlier help.
> 
> Find attached the C source code that you can use to reproduce the
> issue.
> 
> clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast
> -fno-math-errno test.c -S -o test.s -mllvm -debug-only=licm
> LICM hoisting to while.body.lr.ph : %21 = load double** %arrayidx8,
> align 8, !tbaa !5
> LICM hoisting to while.body.lr.ph : %arrayidx72 = getelementptr
> inbounds double* %11, i64 1
> LICM hoisting to while.body.lr.ph : %arrayidx81 = getelementptr
> inbounds double* %11, i64 2
> LICM hoisting to for.body.lr.ph : %2 = ptrtoint i32* %Index to i64
> 
> clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast
> -fno-math-errno test.c -S -o test-cflaa.s -mllvm -use-cfl-aa -mllvm
> -debug-only=licm
> LICM hoisting to for.body.lr.ph : %2 = ptrtoint i32* %Index to i64
> 
> Why CFL AA cannot allow hoisting this out: %21 = load double**
> %arrayidx8, align 8, !tbaa !5
> 
> Which leads to this extra load in assembly code: ldr x14, [x4, x9,
> lsl #3]
> 
> Thanks!
> Ana.
> 
> -----Original Message-----
> From: Hal Finkel [mailto: hfinkel at anl.gov ]
> Sent: Tuesday, January 13, 2015 12:51 AM
> To: Chandler Carruth
> Cc: Jiangning Liu; Pazos, Ana; Ana Pazos; Daniel Berlin; George
> Burgess IV
> Subject: Re: question about enabling cfl-aa and collecting a57
> numbers
> 
> ----- Original Message -----
> > From: "Chandler Carruth" < chandlerc at gmail.com >
> > To: "Ana Pazos" < apazos at codeaurora.org >, "Daniel Berlin" <
> > dberlin at dberlin.org >, "George Burgess IV"
> > < george.burgess.iv at gmail.com >, "Hal Finkel" < hfinkel at anl.gov >
> > Cc: "Jiangning Liu" < Jiangning.Liu at arm.com >, "Ana Pazos"
> > < apazos at quicinc.com >
> > Sent: Monday, January 12, 2015 6:43:44 PM
> > Subject: Re: question about enabling cfl-aa and collecting a57
> > numbers
> > 
> > Not quite:
> > 
> > 
> > On Mon Jan 12 2015 at 4:27:30 PM Ana Pazos < apazos at codeaurora.org
> > >
> > wrote:
> > 
> > Thanks George and Daniel,
> > 
> > 
> > 
> > If the recommended order is “–basicaa –cfla-aa” it means we should
> > fix
> > the trunk code that processes the flags -mllvm
> > use-cfl-aa-in-codegen
> > and –mllvm use-cfl-aa, right?
> > 
> > 
> > 
> > In Transforms/IPO/PassManagerBuilder.cpp and CodeGen/Passes.cpp we
> > see
> > this sequence:
> > 
> > 
> > 
> > if (UseCFLAA)
> > 
> > PM.add(createCFLAliasAnalysisPass());
> > 
> > PM.add(createTypeBasedAliasAnalysisPass());
> > 
> > PM.add(createScopedNoAliasAAPass());
> > 
> > PM.add(createBasicAliasAnalysisPass());
> > 
> > 
> > 
> > So are you recommending changing to the sequence below instead?
> > 
> > 
> > Not quite:
> > 
> > 
> > addPass(createTypeBasedAliasAnalysisPass());
> > 
> > addPass(createScopedNoAliasAAPass());
> > You want it here.
> > 
> > addPass(createBasicAliasAnalysisPass());
> > 
> > if (UseCFLAA)
> > 
> > addPass(createCFLAliasAnalysisPass());
> > 
> > 
> > The way to think about this is as follows:
> > 
> > 
> > if TBAA or ScopedNoAlias *can* return results, you want them to.
> > They
> > are reflecting a-priori knowledge of aliasing.
> > 
> > 
> > if both of those say "maybe", you want to ask CFL. If CFL says
> > "maybe"
> > you want to ask "BasicAA".
> 
> I could very well be misunderstanding something, but I recall that we
> add BasicAA last so that it will be queried first. We do this
> because of our BasicAA delegation hack: We want a MustAlias from
> BasicAA to override a NoAlias from TBAA, so that we can catch cases
> of basic local type punning. Thus, assuming CFL delegates properly,
> I fail to see how switching its order w.r.t. the metadata-based
> analysis changes anything.
> 
> -Hal
> 
> > 
> > 
> > It's possible that you could as a compile-time hack flip the last
> > two
> > to your suggested ordering, but a) it should really only be a
> > compile
> > time hack, and b) we haven't really tested that BasicAA actually
> > delegates rather than directly returning "maybe".
> > 
> > 
> > Thanks,
> > 
> > Ana.
> > 
> > 
> > 
> > From: Daniel Berlin [mailto: dberlin at dberlin.org ]
> > Sent: Friday, January 09, 2015 6:41 PM
> > To: George Burgess IV; Hal Finkel
> > Cc: Ana Pazos; Pazos, Ana; Jiangning Liu; chandlerc
> > 
> > 
> > Subject: Re: question about enabling cfl-aa and collecting a57
> > numbers
> > 
> > 
> > Right. I would not try with -cfl-aa -basicaa as the order, it
> > should
> > be the other way around.
> > 
> > 
> > If you discover performance regressions with *that*, that would be
> > highly interesting.
> > 
> > 
> > On Fri Jan 09 2015 at 6:32:28 PM George Burgess IV <
> > george.burgess.iv at gmail.com > wrote:
> > 
> > 
> > 
> > Thanks for the notes and observations Ana! :)
> > 
> > >> Hal, what do you think? Does it meant that cfl-aa is more
> > >> conservative (note I set the right order -cfa-aa followed by
> > >> -basicaa)?
> > Currently, yes. The implementation as it is is focused heavily on
> > being fast, and on trying to cover cases that BasicAA can’t easily
> > cover itself. If our end goal is to use this as a replacement for
> > BasicAA, there’s probably quite a few low-hanging fruit for
> > accuracy
> > improvements, and we can start playing with sacrificing speed in
> > pursuit of greater accuracy. On the other hand, if the end goal is
> > to
> > append this pass to the list of AA passes already being performed,
> > then I’d be interested in seeing the difference in perf between
> > `-basicaa` and `-basicaa -cfl-aa`, if any.
> > 
> > WRT this issue in particular, I believe CFLAA answers
> > conservatively
> > because:
> > A. it doesn’t take into account constant pointer offsets (i.e. it
> > considers a[0]..a[n] all as the same “address”) B. CFLAA is
> > entirely
> > context insensitive
> > 
> > IIRC, BasicAA is somewhat context sensitive, and it does take into
> > account constant pointer offsets, so it can be more accurate in
> > this
> > case. That being said, I don’t have access to the SPEC code, so I
> > can
> > only speculate. :)
> > 
> > George
> > 
> > > On Jan 9, 2015, at 4:25 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> > > 
> > > Ana,
> > > 
> > > Thanks so much for looking into this. I'm adding some additional
> > > relevant people to the CC line...
> > > 
> > > -Hal
> > > 
> > > ----- Original Message -----
> > >> From: "Ana Pazos" < apazos at codeaurora.org >
> > >> To: "Ana Pazos" < apazos at quicinc.com >, "Hal Finkel" <
> > >> hfinkel at anl.gov >, "Jiangning Liu" < Jiangning.Liu at arm.com >
> > >> Sent: Friday, January 9, 2015 3:12:04 PM
> > >> Subject: RE: question about enabling cfl-aa and collecting a57
> > >> numbers
> > >> 
> > >> Hi Hal and Jiangning,
> > >> 
> > >> I started to look at the effect of cfl-aa on SPEC.
> > >> 
> > >> The first observation is that I see more redundant loads not
> > >> being
> > >> hoisted out of loops by LICM.
> > >> 
> > >> Due to license restrictions, I cannot paste SPEC code here, but
> > >> I
> > >> think you have access to it.
> > >> 
> > >> I tried to simplify a top function in equake (quake.c smvp) to
> > >> show
> > >> the issue. See the attached reduced test case.
> > >> 
> > >> If you look at the the source code you will see something like:
> > >> for {
> > >> ...load v[i][0]...
> > >> while {
> > >> ...load v[i][0]...
> > >> store w[col][0]....
> > >> }
> > >> ...load w[i][0]...
> > >> store w[i][0]...
> > >> }
> > >> 
> > >> It should be possible to avoid load &v[i][0], &v[i][1],
> > >> &vi[i][2]
> > >> inside the while loop.
> > >> 
> > >> In the simplified example that would be this instruction:
> > >> %21 = load double** %arrayidx8, align 8, !tbaa !5
> > >> 
> > >> And the pointer for the load instruction:
> > >> %arrayidx8 = getelementptr inbounds double** %v, i64 %idxprom
> > >> 
> > >> The decision to hoist it fails at LICM.cpp:189: AliasSet &AS =
> > >> CurAST->getAliasSetForPointer(V, Size, AAInfo).isMod.
> > >> 
> > >> With basicaa, only loads are in the alias set, but with cfl-aa
> > >> you
> > >> find loads and stores, so isMod returns true.
> > >> 
> > >> AliasSet with basicaa:
> > >> AliasSet[0x2b35aa0, 7] may alias, Ref Pointers: (double***
> > >> %arrayidx32, 8), (double** %12, 8), (double** %arrayidx36, 8),
> > >> (double** %arrayidx8, 8), (double** %arrayidx68, 8), (double**
> > >> %arrayidx77, 8), (double** %arrayidx85, 8)
> > >> $5 = void
> > >> 
> > >> AliasSet with cfa-aa:
> > >> AliasSet[0x2b37df0, 20] may alias, Mod/Ref Pointers: (i32*
> > >> %arrayidx30, 4), (double*** %arrayidx32, 8), (double** %12, 8),
> > >> (double* %13, 8), (double** %arrayidx36, 8), (double* %15, 8),
> > >> (double* %arrayidx42, 8), (double* %arrayidx45, 8), (double*
> > >> %arrayidx51, 8), (double* %arrayidx54, 8), (double** %arrayidx8,
> > >> 8), (double* %21, 8), (double** %arrayidx68, 8), (double* %23,
> > >> 8),
> > >> (double* %arrayidx72, 8), (double** %arrayidx77, 8), (double*
> > >> %26,
> > >> 8), (double* %arrayidx81, 8), (double** %arrayidx85, 8),
> > >> (double*
> > >> %29, 8)
> > >> $8 = void
> > >> 
> > >> You can reproduce it with the commands:
> > >> opt -O3 -S -cfl-aa -basicaa -licm -debug-only=licm reduce.ll -o
> > >> out
> > >> 
> > >> opt -O3 -S - -basicaa -licm -debug-only=licm reduce.ll -o out
> > >> LICM
> > >> hoisting to while.body.lr.ph : %21 = load double** %arrayidx8,
> > >> align 8, !tbaa !5 LICM hoisting to while.body.lr.ph :
> > >> %arrayidx72 =
> > >> getelementptr inbounds double* %11, i64 1 LICM hoisting to
> > >> while.body.lr.ph : %arrayidx81 = getelementptr inbounds double*
> > >> %11, i64 2
> > >> 
> > >> Hal, what do you think? Does it meant that cfl-aa is more
> > >> conservative (note I set the right order -cfa-aa followed by
> > >> -basicaa)? Could it be an issue with building the AliasSet
> > >> tracker?
> > >> Any pointers on how to fix this?
> > >> 
> > >> Thanks,
> > >> Ana.
> > >> 
> > >> 
> > >> 
> > >> -----Original Message-----
> > >> From: Jiangning Liu [mailto: Jiangning.Liu at arm.com ]
> > >> Sent: Wednesday, December 24, 2014 6:27 PM
> > >> To: Hal Finkel; Ana Pazos
> > >> Subject: RE: question about enabling cfl-aa and collecting a57
> > >> numbers
> > >> 
> > >> Hi,
> > >> 
> > >> I saw big regressions on cortex-A57 for the following SPEC
> > >> benchmarks after using "-mllvm -use-cfl-aa ".
> > >> 
> > >> spec.cpu2006.ref.462_libquantum 9.22% spec.cpu2000.ref.179_art
> > >> 5.32%
> > >> spec.cpu2000.ref.256_bzip2 4.91%
> > >> 
> > >> Thanks,
> > >> -Jiangning
> > >> 
> > >>> -----Original Message-----
> > >>> From: Hal Finkel [mailto: hfinkel at anl.gov ]
> > >>> Sent: Wednesday, December 24, 2014 3:56 AM
> > >>> To: Ana Pazos
> > >>> Cc: Jiangning Liu
> > >>> Subject: Re: question about enabling cfl-aa and collecting a57
> > >>> numbers
> > >>> 
> > >>> ----- Original Message -----
> > >>>> From: "Ana Pazos" < apazos at codeaurora.org >
> > >>>> To: "Hal Finkel" < hfinkel at anl.gov >, "Jiangning Liu"
> > >>>> < Jiangning.Liu at arm.com >
> > >>>> Sent: Monday, December 22, 2014 4:25:07 PM
> > >>>> Subject: question about enabling cfl-aa and collecting a57
> > >>>> numbers
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> Hi Hal,
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> Can you clarify to enable cfl-aa from clang we need both these
> > >>>> flags?
> > >>>> 
> > >>>> 
> > >>> 
> > >>> Hi Ana,
> > >>> 
> > >>> Thanks for following-up on this! You only need (-mllvm
> > >>> -use-cfl-aa)
> > >>> to
> > >>> use CFL AA during the main optimization pipeline, and just this
> > >>> is
> > >>> enough to reproduce the performance regressions as far as I
> > >>> know.
> > >>> You
> > >>> can also use CFL AA during code generation (-mllvm
> > >>> -use-cfl-aa-in-codegen), which matters only if your using AA at
> > >>> all for code generation (which AArch64 does only for the
> > >>> Cortex/A53).
> > >>> 
> > >>> -Hal
> > >>> 
> > >>>> 
> > >>>> -mllvm -use-cfl-aa
> > >>>> 
> > >>>> -mllvm -cfl-aa-in-codegen
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> Hi Jiangning,
> > >>>> 
> > >>>> Did you collect a57 perf results for SPEC 2000 and 2006
> > >>>> enabling
> > >>>> these two flags?
> > >>>> 
> > >>>> Did you notice any correctness and performance regression?
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> Thanks,
> > >>>> 
> > >>>> Ana.
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> Ana Pazos
> > >>>> Qualcomm Innovation Center, Inc.
> > >>>> The Qualcomm Innovation Center, Inc. is a member of the Code
> > >>>> Aurora Forum, a Linux Foundation Collaborative Project.
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> 
> > >>> 
> > >>> --
> > >>> Hal Finkel
> > >>> Assistant Computational Scientist
> > >>> Leadership Computing Facility
> > >>> Argonne National Laboratory
> > >> 
> > >> 
> > >> -- IMPORTANT NOTICE: The contents of this email and any
> > >> attachments
> > >> are confidential and may also be privileged. If you are not the
> > >> intended recipient, please notify the sender immediately and do
> > >> not
> > >> disclose the contents to any other person, use it for any
> > >> purpose,
> > >> or store or copy the information in any medium. Thank you.
> > >> 
> > >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
> > >> 9NJ, Registered in England & Wales, Company No: 2557590 ARM
> > >> Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
> > >> 9NJ, Registered in England & Wales, Company No: 2548782
> > >> 
> > > 
> > > --
> > > Hal Finkel
> > > Assistant Computational Scientist
> > > Leadership Computing Facility
> > > Argonne National Laboratory
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
> 
> 
> 
> 
> 
> <cflaafix.diff>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory




More information about the llvm-dev mailing list