[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Tue Jan 13 15:58:19 PST 2015

Hi folks,

Moving the discussion to llvm.dev.

None of the changes we talked earlier help.

Find attached the C source code that you can use to reproduce the issue.

clang --target=aarch64-linux-gnu  -c -mcpu=cortex-a57 -Ofast   -fno-math-errno   test.c -S -o test.s  -mllvm -debug-only=licm
LICM hoisting to while.body.lr.ph:   %21 = load double** %arrayidx8, align 8, !tbaa !5
LICM hoisting to while.body.lr.ph:   %arrayidx72 = getelementptr inbounds double* %11, i64 1
LICM hoisting to while.body.lr.ph:   %arrayidx81 = getelementptr inbounds double* %11, i64 2
LICM hoisting to for.body.lr.ph:   %2 = ptrtoint i32* %Index to i64

clang --target=aarch64-linux-gnu  -c   -mcpu=cortex-a57  -Ofast   -fno-math-errno   test.c -S -o test-cflaa.s -mllvm -use-cfl-aa -mllvm -debug-only=licm
LICM hoisting to for.body.lr.ph:   %2 = ptrtoint i32* %Index to i64

Why CFL AA cannot allow hoisting this out:  %21 = load double** %arrayidx8, align 8, !tbaa !5

Which leads to this extra load in assembly code: ldr	x14, [x4, x9, lsl #3]

Thanks!
Ana.

-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov] 
Sent: Tuesday, January 13, 2015 12:51 AM
To: Chandler Carruth
Cc: Jiangning Liu; Pazos, Ana; Ana Pazos; Daniel Berlin; George Burgess IV
Subject: Re: question about enabling cfl-aa and collecting a57 numbers

----- Original Message -----
> From: "Chandler Carruth" <chandlerc at gmail.com>
> To: "Ana Pazos" <apazos at codeaurora.org>, "Daniel Berlin" <dberlin at dberlin.org>, "George Burgess IV"
> <george.burgess.iv at gmail.com>, "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Jiangning Liu" <Jiangning.Liu at arm.com>, "Ana Pazos" 
> <apazos at quicinc.com>
> Sent: Monday, January 12, 2015 6:43:44 PM
> Subject: Re: question about enabling cfl-aa and collecting a57 numbers
> 
> Not quite:
> 
> 
> On Mon Jan 12 2015 at 4:27:30 PM Ana Pazos < apazos at codeaurora.org >
> wrote:
> 
> Thanks George and Daniel,
> 
> 
> 
> If the recommended order is “–basicaa –cfla-aa” it means we should fix 
> the trunk code that processes the flags -mllvm use-cfl-aa-in-codegen 
> and –mllvm use-cfl-aa, right?
> 
> 
> 
> In Transforms/IPO/PassManagerBuilder.cpp and CodeGen/Passes.cpp we see 
> this sequence:
> 
> 
> 
> if (UseCFLAA)
> 
> PM.add(createCFLAliasAnalysisPass());
> 
> PM.add(createTypeBasedAliasAnalysisPass());
> 
> PM.add(createScopedNoAliasAAPass());
> 
> PM.add(createBasicAliasAnalysisPass());
> 
> 
> 
> So are you recommending changing to the sequence below instead?
> 
> 
> Not quite:
> 
> 
> addPass(createTypeBasedAliasAnalysisPass());
> 
> addPass(createScopedNoAliasAAPass());
> You want it here.
> 
> addPass(createBasicAliasAnalysisPass());
> 
> if (UseCFLAA)
> 
> addPass(createCFLAliasAnalysisPass());
> 
> 
> The way to think about this is as follows:
> 
> 
> if TBAA or ScopedNoAlias *can* return results, you want them to. They 
> are reflecting a-priori knowledge of aliasing.
> 
> 
> if both of those say "maybe", you want to ask CFL. If CFL says "maybe" 
> you want to ask "BasicAA".

I could very well be misunderstanding something, but I recall that we add BasicAA last so that it will be queried first. We do this because of our BasicAA delegation hack: We want a MustAlias from BasicAA to override a NoAlias from TBAA, so that we can catch cases of basic local type punning. Thus, assuming CFL delegates properly, I fail to see how switching its order w.r.t. the metadata-based analysis changes anything.

 -Hal

> 
> 
> It's possible that you could as a compile-time hack flip the last two 
> to your suggested ordering, but a) it should really only be a compile 
> time hack, and b) we haven't really tested that BasicAA actually 
> delegates rather than directly returning "maybe".
> 
> 
> Thanks,
> 
> Ana.
> 
> 
> 
> From: Daniel Berlin [mailto: dberlin at dberlin.org ]
> Sent: Friday, January 09, 2015 6:41 PM
> To: George Burgess IV; Hal Finkel
> Cc: Ana Pazos; Pazos, Ana; Jiangning Liu; chandlerc
> 
> 
> Subject: Re: question about enabling cfl-aa and collecting a57 numbers
> 
> 
> Right. I would not try with -cfl-aa -basicaa as the order, it should 
> be the other way around.
> 
> 
> If you discover performance regressions with *that*, that would be 
> highly interesting.
> 
> 
> On Fri Jan 09 2015 at 6:32:28 PM George Burgess IV < 
> george.burgess.iv at gmail.com > wrote:
> 
> 
> 
> Thanks for the notes and observations Ana! :)
> 
> >> Hal, what do you think? Does it meant that cfl-aa is more 
> >> conservative (note I set the right order -cfa-aa followed by 
> >> -basicaa)?
> Currently, yes. The implementation as it is is focused heavily on 
> being fast, and on trying to cover cases that BasicAA can’t easily 
> cover itself. If our end goal is to use this as a replacement for 
> BasicAA, there’s probably quite a few low-hanging fruit for accuracy 
> improvements, and we can start playing with sacrificing speed in 
> pursuit of greater accuracy. On the other hand, if the end goal is to 
> append this pass to the list of AA passes already being performed, 
> then I’d be interested in seeing the difference in perf between 
> `-basicaa` and `-basicaa -cfl-aa`, if any.
> 
> WRT this issue in particular, I believe CFLAA answers conservatively
> because:
> A. it doesn’t take into account constant pointer offsets (i.e. it 
> considers a[0]..a[n] all as the same “address”) B. CFLAA is entirely 
> context insensitive
> 
> IIRC, BasicAA is somewhat context sensitive, and it does take into 
> account constant pointer offsets, so it can be more accurate in this 
> case. That being said, I don’t have access to the SPEC code, so I can 
> only speculate. :)
> 
> George
> 
> > On Jan 9, 2015, at 4:25 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> > 
> > Ana,
> > 
> > Thanks so much for looking into this. I'm adding some additional 
> > relevant people to the CC line...
> > 
> > -Hal
> > 
> > ----- Original Message -----
> >> From: "Ana Pazos" < apazos at codeaurora.org >
> >> To: "Ana Pazos" < apazos at quicinc.com >, "Hal Finkel" < 
> >> hfinkel at anl.gov >, "Jiangning Liu" < Jiangning.Liu at arm.com >
> >> Sent: Friday, January 9, 2015 3:12:04 PM
> >> Subject: RE: question about enabling cfl-aa and collecting a57 
> >> numbers
> >> 
> >> Hi Hal and Jiangning,
> >> 
> >> I started to look at the effect of cfl-aa on SPEC.
> >> 
> >> The first observation is that I see more redundant loads not being 
> >> hoisted out of loops by LICM.
> >> 
> >> Due to license restrictions, I cannot paste SPEC code here, but I 
> >> think you have access to it.
> >> 
> >> I tried to simplify a top function in equake (quake.c smvp) to show 
> >> the issue. See the attached reduced test case.
> >> 
> >> If you look at the the source code you will see something like:
> >> for {
> >> ...load v[i][0]...
> >> while {
> >> ...load v[i][0]...
> >> store w[col][0]....
> >> }
> >> ...load w[i][0]...
> >> store w[i][0]...
> >> }
> >> 
> >> It should be possible to avoid load &v[i][0], &v[i][1], &vi[i][2] 
> >> inside the while loop.
> >> 
> >> In the simplified example that would be this instruction:
> >> %21 = load double** %arrayidx8, align 8, !tbaa !5
> >> 
> >> And the pointer for the load instruction:
> >> %arrayidx8 = getelementptr inbounds double** %v, i64 %idxprom
> >> 
> >> The decision to hoist it fails at LICM.cpp:189: AliasSet &AS =
> >> CurAST->getAliasSetForPointer(V, Size, AAInfo).isMod.
> >> 
> >> With basicaa, only loads are in the alias set, but with cfl-aa you 
> >> find loads and stores, so isMod returns true.
> >> 
> >> AliasSet with basicaa:
> >> AliasSet[0x2b35aa0, 7] may alias, Ref Pointers: (double*** 
> >> %arrayidx32, 8), (double** %12, 8), (double** %arrayidx36, 8),
> >> (double** %arrayidx8, 8), (double** %arrayidx68, 8), (double** 
> >> %arrayidx77, 8), (double** %arrayidx85, 8)
> >> $5 = void
> >> 
> >> AliasSet with cfa-aa:
> >> AliasSet[0x2b37df0, 20] may alias, Mod/Ref Pointers: (i32* 
> >> %arrayidx30, 4), (double*** %arrayidx32, 8), (double** %12, 8),
> >> (double* %13, 8), (double** %arrayidx36, 8), (double* %15, 8),
> >> (double* %arrayidx42, 8), (double* %arrayidx45, 8), (double* 
> >> %arrayidx51, 8), (double* %arrayidx54, 8), (double** %arrayidx8, 
> >> 8), (double* %21, 8), (double** %arrayidx68, 8), (double* %23, 8),
> >> (double* %arrayidx72, 8), (double** %arrayidx77, 8), (double* %26, 
> >> 8), (double* %arrayidx81, 8), (double** %arrayidx85, 8), (double* 
> >> %29, 8)
> >> $8 = void
> >> 
> >> You can reproduce it with the commands:
> >> opt -O3 -S -cfl-aa -basicaa -licm -debug-only=licm reduce.ll -o out
> >> 
> >> opt -O3 -S - -basicaa -licm -debug-only=licm reduce.ll -o out LICM 
> >> hoisting to while.body.lr.ph : %21 = load double** %arrayidx8, 
> >> align 8, !tbaa !5 LICM hoisting to while.body.lr.ph : %arrayidx72 = 
> >> getelementptr inbounds double* %11, i64 1 LICM hoisting to 
> >> while.body.lr.ph : %arrayidx81 = getelementptr inbounds double* 
> >> %11, i64 2
> >> 
> >> Hal, what do you think? Does it meant that cfl-aa is more 
> >> conservative (note I set the right order -cfa-aa followed by 
> >> -basicaa)? Could it be an issue with building the AliasSet tracker?
> >> Any pointers on how to fix this?
> >> 
> >> Thanks,
> >> Ana.
> >> 
> >> 
> >> 
> >> -----Original Message-----
> >> From: Jiangning Liu [mailto: Jiangning.Liu at arm.com ]
> >> Sent: Wednesday, December 24, 2014 6:27 PM
> >> To: Hal Finkel; Ana Pazos
> >> Subject: RE: question about enabling cfl-aa and collecting a57 
> >> numbers
> >> 
> >> Hi,
> >> 
> >> I saw big regressions on cortex-A57 for the following SPEC 
> >> benchmarks after using "-mllvm -use-cfl-aa ".
> >> 
> >> spec.cpu2006.ref.462_libquantum 9.22% spec.cpu2000.ref.179_art 
> >> 5.32%
> >> spec.cpu2000.ref.256_bzip2 4.91%
> >> 
> >> Thanks,
> >> -Jiangning
> >> 
> >>> -----Original Message-----
> >>> From: Hal Finkel [mailto: hfinkel at anl.gov ]
> >>> Sent: Wednesday, December 24, 2014 3:56 AM
> >>> To: Ana Pazos
> >>> Cc: Jiangning Liu
> >>> Subject: Re: question about enabling cfl-aa and collecting a57 
> >>> numbers
> >>> 
> >>> ----- Original Message -----
> >>>> From: "Ana Pazos" < apazos at codeaurora.org >
> >>>> To: "Hal Finkel" < hfinkel at anl.gov >, "Jiangning Liu"
> >>>> < Jiangning.Liu at arm.com >
> >>>> Sent: Monday, December 22, 2014 4:25:07 PM
> >>>> Subject: question about enabling cfl-aa and collecting a57 
> >>>> numbers
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> Hi Hal,
> >>>> 
> >>>> 
> >>>> 
> >>>> Can you clarify to enable cfl-aa from clang we need both these 
> >>>> flags?
> >>>> 
> >>>> 
> >>> 
> >>> Hi Ana,
> >>> 
> >>> Thanks for following-up on this! You only need (-mllvm
> >>> -use-cfl-aa)
> >>> to
> >>> use CFL AA during the main optimization pipeline, and just this is 
> >>> enough to reproduce the performance regressions as far as I know.
> >>> You
> >>> can also use CFL AA during code generation (-mllvm 
> >>> -use-cfl-aa-in-codegen), which matters only if your using AA at 
> >>> all for code generation (which AArch64 does only for the 
> >>> Cortex/A53).
> >>> 
> >>> -Hal
> >>> 
> >>>> 
> >>>> -mllvm -use-cfl-aa
> >>>> 
> >>>> -mllvm -cfl-aa-in-codegen
> >>>> 
> >>>> 
> >>>> 
> >>>> Hi Jiangning,
> >>>> 
> >>>> Did you collect a57 perf results for SPEC 2000 and 2006 enabling 
> >>>> these two flags?
> >>>> 
> >>>> Did you notice any correctness and performance regression?
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> Thanks,
> >>>> 
> >>>> Ana.
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> Ana Pazos
> >>>> Qualcomm Innovation Center, Inc.
> >>>> The Qualcomm Innovation Center, Inc. is a member of the Code 
> >>>> Aurora Forum, a Linux Foundation Collaborative Project.
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>> 
> >>> --
> >>> Hal Finkel
> >>> Assistant Computational Scientist
> >>> Leadership Computing Facility
> >>> Argonne National Laboratory
> >> 
> >> 
> >> -- IMPORTANT NOTICE: The contents of this email and any attachments 
> >> are confidential and may also be privileged. If you are not the 
> >> intended recipient, please notify the sender immediately and do not 
> >> disclose the contents to any other person, use it for any purpose, 
> >> or store or copy the information in any medium. Thank you.
> >> 
> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 
> >> 9NJ, Registered in England & Wales, Company No: 2557590 ARM 
> >> Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 
> >> 9NJ, Registered in England & Wales, Company No: 2548782
> >> 
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.c
Type: application/octet-stream
Size: 582 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150113/d826dabb/attachment.obj>