[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Daniel Berlin dberlin at dberlin.org
Tue Jan 13 20:58:47 PST 2015


Can you send me actual LLVM IR or a preprocessed source from using -E?
I don't have a machine handy that has headers that target that arch.


On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin <dberlin at dberlin.org> wrote:

> Anything other than noalias or mustalias should be getting passed down the
> stack, so either that is not happening or CFL aa is giving better answers
> and something does worse with those better answers.  I'll take a look this
> evening
> On Jan 13, 2015 3:58 PM, "Ana Pazos" <apazos at codeaurora.org> wrote:
>
>> Hi folks,
>>
>> Moving the discussion to llvm.dev.
>>
>> None of the changes we talked earlier help.
>>
>> Find attached the C source code that you can use to reproduce the issue.
>>
>> clang --target=aarch64-linux-gnu  -c -mcpu=cortex-a57 -Ofast
>>  -fno-math-errno   test.c -S -o test.s  -mllvm -debug-only=licm
>> LICM hoisting to while.body.lr.ph:   %21 = load double** %arrayidx8,
>> align 8, !tbaa !5
>> LICM hoisting to while.body.lr.ph:   %arrayidx72 = getelementptr
>> inbounds double* %11, i64 1
>> LICM hoisting to while.body.lr.ph:   %arrayidx81 = getelementptr
>> inbounds double* %11, i64 2
>> LICM hoisting to for.body.lr.ph:   %2 = ptrtoint i32* %Index to i64
>>
>> clang --target=aarch64-linux-gnu  -c   -mcpu=cortex-a57  -Ofast
>>  -fno-math-errno   test.c -S -o test-cflaa.s -mllvm -use-cfl-aa -mllvm
>> -debug-only=licm
>> LICM hoisting to for.body.lr.ph:   %2 = ptrtoint i32* %Index to i64
>>
>> Why CFL AA cannot allow hoisting this out:  %21 = load double**
>> %arrayidx8, align 8, !tbaa !5
>>
>> Which leads to this extra load in assembly code: ldr    x14, [x4, x9, lsl
>> #3]
>>
>> Thanks!
>> Ana.
>>
>> -----Original Message-----
>> From: Hal Finkel [mailto:hfinkel at anl.gov]
>> Sent: Tuesday, January 13, 2015 12:51 AM
>> To: Chandler Carruth
>> Cc: Jiangning Liu; Pazos, Ana; Ana Pazos; Daniel Berlin; George Burgess IV
>> Subject: Re: question about enabling cfl-aa and collecting a57 numbers
>>
>> ----- Original Message -----
>> > From: "Chandler Carruth" <chandlerc at gmail.com>
>> > To: "Ana Pazos" <apazos at codeaurora.org>, "Daniel Berlin" <
>> dberlin at dberlin.org>, "George Burgess IV"
>> > <george.burgess.iv at gmail.com>, "Hal Finkel" <hfinkel at anl.gov>
>> > Cc: "Jiangning Liu" <Jiangning.Liu at arm.com>, "Ana Pazos"
>> > <apazos at quicinc.com>
>> > Sent: Monday, January 12, 2015 6:43:44 PM
>> > Subject: Re: question about enabling cfl-aa and collecting a57 numbers
>> >
>> > Not quite:
>> >
>> >
>> > On Mon Jan 12 2015 at 4:27:30 PM Ana Pazos < apazos at codeaurora.org >
>> > wrote:
>> >
>> > Thanks George and Daniel,
>> >
>> >
>> >
>> > If the recommended order is “–basicaa –cfla-aa” it means we should fix
>> > the trunk code that processes the flags -mllvm use-cfl-aa-in-codegen
>> > and –mllvm use-cfl-aa, right?
>> >
>> >
>> >
>> > In Transforms/IPO/PassManagerBuilder.cpp and CodeGen/Passes.cpp we see
>> > this sequence:
>> >
>> >
>> >
>> > if (UseCFLAA)
>> >
>> > PM.add(createCFLAliasAnalysisPass());
>> >
>> > PM.add(createTypeBasedAliasAnalysisPass());
>> >
>> > PM.add(createScopedNoAliasAAPass());
>> >
>> > PM.add(createBasicAliasAnalysisPass());
>> >
>> >
>> >
>> > So are you recommending changing to the sequence below instead?
>> >
>> >
>> > Not quite:
>> >
>> >
>> > addPass(createTypeBasedAliasAnalysisPass());
>> >
>> > addPass(createScopedNoAliasAAPass());
>> > You want it here.
>> >
>> > addPass(createBasicAliasAnalysisPass());
>> >
>> > if (UseCFLAA)
>> >
>> > addPass(createCFLAliasAnalysisPass());
>> >
>> >
>> > The way to think about this is as follows:
>> >
>> >
>> > if TBAA or ScopedNoAlias *can* return results, you want them to. They
>> > are reflecting a-priori knowledge of aliasing.
>> >
>> >
>> > if both of those say "maybe", you want to ask CFL. If CFL says "maybe"
>> > you want to ask "BasicAA".
>>
>> I could very well be misunderstanding something, but I recall that we add
>> BasicAA last so that it will be queried first. We do this because of our
>> BasicAA delegation hack: We want a MustAlias from BasicAA to override a
>> NoAlias from TBAA, so that we can catch cases of basic local type punning.
>> Thus, assuming CFL delegates properly, I fail to see how switching its
>> order w.r.t. the metadata-based analysis changes anything.
>>
>>  -Hal
>>
>> >
>> >
>> > It's possible that you could as a compile-time hack flip the last two
>> > to your suggested ordering, but a) it should really only be a compile
>> > time hack, and b) we haven't really tested that BasicAA actually
>> > delegates rather than directly returning "maybe".
>> >
>> >
>> > Thanks,
>> >
>> > Ana.
>> >
>> >
>> >
>> > From: Daniel Berlin [mailto: dberlin at dberlin.org ]
>> > Sent: Friday, January 09, 2015 6:41 PM
>> > To: George Burgess IV; Hal Finkel
>> > Cc: Ana Pazos; Pazos, Ana; Jiangning Liu; chandlerc
>> >
>> >
>> > Subject: Re: question about enabling cfl-aa and collecting a57 numbers
>> >
>> >
>> > Right. I would not try with -cfl-aa -basicaa as the order, it should
>> > be the other way around.
>> >
>> >
>> > If you discover performance regressions with *that*, that would be
>> > highly interesting.
>> >
>> >
>> > On Fri Jan 09 2015 at 6:32:28 PM George Burgess IV <
>> > george.burgess.iv at gmail.com > wrote:
>> >
>> >
>> >
>> > Thanks for the notes and observations Ana! :)
>> >
>> > >> Hal, what do you think? Does it meant that cfl-aa is more
>> > >> conservative (note I set the right order -cfa-aa followed by
>> > >> -basicaa)?
>> > Currently, yes. The implementation as it is is focused heavily on
>> > being fast, and on trying to cover cases that BasicAA can’t easily
>> > cover itself. If our end goal is to use this as a replacement for
>> > BasicAA, there’s probably quite a few low-hanging fruit for accuracy
>> > improvements, and we can start playing with sacrificing speed in
>> > pursuit of greater accuracy. On the other hand, if the end goal is to
>> > append this pass to the list of AA passes already being performed,
>> > then I’d be interested in seeing the difference in perf between
>> > `-basicaa` and `-basicaa -cfl-aa`, if any.
>> >
>> > WRT this issue in particular, I believe CFLAA answers conservatively
>> > because:
>> > A. it doesn’t take into account constant pointer offsets (i.e. it
>> > considers a[0]..a[n] all as the same “address”) B. CFLAA is entirely
>> > context insensitive
>> >
>> > IIRC, BasicAA is somewhat context sensitive, and it does take into
>> > account constant pointer offsets, so it can be more accurate in this
>> > case. That being said, I don’t have access to the SPEC code, so I can
>> > only speculate. :)
>> >
>> > George
>> >
>> > > On Jan 9, 2015, at 4:25 PM, Hal Finkel < hfinkel at anl.gov > wrote:
>> > >
>> > > Ana,
>> > >
>> > > Thanks so much for looking into this. I'm adding some additional
>> > > relevant people to the CC line...
>> > >
>> > > -Hal
>> > >
>> > > ----- Original Message -----
>> > >> From: "Ana Pazos" < apazos at codeaurora.org >
>> > >> To: "Ana Pazos" < apazos at quicinc.com >, "Hal Finkel" <
>> > >> hfinkel at anl.gov >, "Jiangning Liu" < Jiangning.Liu at arm.com >
>> > >> Sent: Friday, January 9, 2015 3:12:04 PM
>> > >> Subject: RE: question about enabling cfl-aa and collecting a57
>> > >> numbers
>> > >>
>> > >> Hi Hal and Jiangning,
>> > >>
>> > >> I started to look at the effect of cfl-aa on SPEC.
>> > >>
>> > >> The first observation is that I see more redundant loads not being
>> > >> hoisted out of loops by LICM.
>> > >>
>> > >> Due to license restrictions, I cannot paste SPEC code here, but I
>> > >> think you have access to it.
>> > >>
>> > >> I tried to simplify a top function in equake (quake.c smvp) to show
>> > >> the issue. See the attached reduced test case.
>> > >>
>> > >> If you look at the the source code you will see something like:
>> > >> for {
>> > >> ...load v[i][0]...
>> > >> while {
>> > >> ...load v[i][0]...
>> > >> store w[col][0]....
>> > >> }
>> > >> ...load w[i][0]...
>> > >> store w[i][0]...
>> > >> }
>> > >>
>> > >> It should be possible to avoid load &v[i][0], &v[i][1], &vi[i][2]
>> > >> inside the while loop.
>> > >>
>> > >> In the simplified example that would be this instruction:
>> > >> %21 = load double** %arrayidx8, align 8, !tbaa !5
>> > >>
>> > >> And the pointer for the load instruction:
>> > >> %arrayidx8 = getelementptr inbounds double** %v, i64 %idxprom
>> > >>
>> > >> The decision to hoist it fails at LICM.cpp:189: AliasSet &AS =
>> > >> CurAST->getAliasSetForPointer(V, Size, AAInfo).isMod.
>> > >>
>> > >> With basicaa, only loads are in the alias set, but with cfl-aa you
>> > >> find loads and stores, so isMod returns true.
>> > >>
>> > >> AliasSet with basicaa:
>> > >> AliasSet[0x2b35aa0, 7] may alias, Ref Pointers: (double***
>> > >> %arrayidx32, 8), (double** %12, 8), (double** %arrayidx36, 8),
>> > >> (double** %arrayidx8, 8), (double** %arrayidx68, 8), (double**
>> > >> %arrayidx77, 8), (double** %arrayidx85, 8)
>> > >> $5 = void
>> > >>
>> > >> AliasSet with cfa-aa:
>> > >> AliasSet[0x2b37df0, 20] may alias, Mod/Ref Pointers: (i32*
>> > >> %arrayidx30, 4), (double*** %arrayidx32, 8), (double** %12, 8),
>> > >> (double* %13, 8), (double** %arrayidx36, 8), (double* %15, 8),
>> > >> (double* %arrayidx42, 8), (double* %arrayidx45, 8), (double*
>> > >> %arrayidx51, 8), (double* %arrayidx54, 8), (double** %arrayidx8,
>> > >> 8), (double* %21, 8), (double** %arrayidx68, 8), (double* %23, 8),
>> > >> (double* %arrayidx72, 8), (double** %arrayidx77, 8), (double* %26,
>> > >> 8), (double* %arrayidx81, 8), (double** %arrayidx85, 8), (double*
>> > >> %29, 8)
>> > >> $8 = void
>> > >>
>> > >> You can reproduce it with the commands:
>> > >> opt -O3 -S -cfl-aa -basicaa -licm -debug-only=licm reduce.ll -o out
>> > >>
>> > >> opt -O3 -S - -basicaa -licm -debug-only=licm reduce.ll -o out LICM
>> > >> hoisting to while.body.lr.ph : %21 = load double** %arrayidx8,
>> > >> align 8, !tbaa !5 LICM hoisting to while.body.lr.ph : %arrayidx72 =
>> > >> getelementptr inbounds double* %11, i64 1 LICM hoisting to
>> > >> while.body.lr.ph : %arrayidx81 = getelementptr inbounds double*
>> > >> %11, i64 2
>> > >>
>> > >> Hal, what do you think? Does it meant that cfl-aa is more
>> > >> conservative (note I set the right order -cfa-aa followed by
>> > >> -basicaa)? Could it be an issue with building the AliasSet tracker?
>> > >> Any pointers on how to fix this?
>> > >>
>> > >> Thanks,
>> > >> Ana.
>> > >>
>> > >>
>> > >>
>> > >> -----Original Message-----
>> > >> From: Jiangning Liu [mailto: Jiangning.Liu at arm.com ]
>> > >> Sent: Wednesday, December 24, 2014 6:27 PM
>> > >> To: Hal Finkel; Ana Pazos
>> > >> Subject: RE: question about enabling cfl-aa and collecting a57
>> > >> numbers
>> > >>
>> > >> Hi,
>> > >>
>> > >> I saw big regressions on cortex-A57 for the following SPEC
>> > >> benchmarks after using "-mllvm -use-cfl-aa ".
>> > >>
>> > >> spec.cpu2006.ref.462_libquantum 9.22% spec.cpu2000.ref.179_art
>> > >> 5.32%
>> > >> spec.cpu2000.ref.256_bzip2 4.91%
>> > >>
>> > >> Thanks,
>> > >> -Jiangning
>> > >>
>> > >>> -----Original Message-----
>> > >>> From: Hal Finkel [mailto: hfinkel at anl.gov ]
>> > >>> Sent: Wednesday, December 24, 2014 3:56 AM
>> > >>> To: Ana Pazos
>> > >>> Cc: Jiangning Liu
>> > >>> Subject: Re: question about enabling cfl-aa and collecting a57
>> > >>> numbers
>> > >>>
>> > >>> ----- Original Message -----
>> > >>>> From: "Ana Pazos" < apazos at codeaurora.org >
>> > >>>> To: "Hal Finkel" < hfinkel at anl.gov >, "Jiangning Liu"
>> > >>>> < Jiangning.Liu at arm.com >
>> > >>>> Sent: Monday, December 22, 2014 4:25:07 PM
>> > >>>> Subject: question about enabling cfl-aa and collecting a57
>> > >>>> numbers
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> Hi Hal,
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> Can you clarify to enable cfl-aa from clang we need both these
>> > >>>> flags?
>> > >>>>
>> > >>>>
>> > >>>
>> > >>> Hi Ana,
>> > >>>
>> > >>> Thanks for following-up on this! You only need (-mllvm
>> > >>> -use-cfl-aa)
>> > >>> to
>> > >>> use CFL AA during the main optimization pipeline, and just this is
>> > >>> enough to reproduce the performance regressions as far as I know.
>> > >>> You
>> > >>> can also use CFL AA during code generation (-mllvm
>> > >>> -use-cfl-aa-in-codegen), which matters only if your using AA at
>> > >>> all for code generation (which AArch64 does only for the
>> > >>> Cortex/A53).
>> > >>>
>> > >>> -Hal
>> > >>>
>> > >>>>
>> > >>>> -mllvm -use-cfl-aa
>> > >>>>
>> > >>>> -mllvm -cfl-aa-in-codegen
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> Hi Jiangning,
>> > >>>>
>> > >>>> Did you collect a57 perf results for SPEC 2000 and 2006 enabling
>> > >>>> these two flags?
>> > >>>>
>> > >>>> Did you notice any correctness and performance regression?
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> Thanks,
>> > >>>>
>> > >>>> Ana.
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> Ana Pazos
>> > >>>> Qualcomm Innovation Center, Inc.
>> > >>>> The Qualcomm Innovation Center, Inc. is a member of the Code
>> > >>>> Aurora Forum, a Linux Foundation Collaborative Project.
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>
>> > >>> --
>> > >>> Hal Finkel
>> > >>> Assistant Computational Scientist
>> > >>> Leadership Computing Facility
>> > >>> Argonne National Laboratory
>> > >>
>> > >>
>> > >> -- IMPORTANT NOTICE: The contents of this email and any attachments
>> > >> are confidential and may also be privileged. If you are not the
>> > >> intended recipient, please notify the sender immediately and do not
>> > >> disclose the contents to any other person, use it for any purpose,
>> > >> or store or copy the information in any medium. Thank you.
>> > >>
>> > >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
>> > >> 9NJ, Registered in England & Wales, Company No: 2557590 ARM
>> > >> Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>> > >> 9NJ, Registered in England & Wales, Company No: 2548782
>> > >>
>> > >
>> > > --
>> > > Hal Finkel
>> > > Assistant Computational Scientist
>> > > Leadership Computing Facility
>> > > Argonne National Laboratory
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150114/6ab40d61/attachment.html>


More information about the llvm-dev mailing list