[LLVMdev] Contributing the Apple ARM64 compiler backend

Fri Jun 27 16:30:21 PDT 2014

AArch64AddressTypePromotion.cpp does a fair bit of work to help make these things work out well. It could probably be generalized for non-AArch64 targets as per the comment in the file header.

> On Jun 26, 2014, at 10:42 AM, Sanjay Patel <spatel at rotateright.com> wrote:
> 
> Cool HW trick. :)
> Are those 'sxtw' ops free? 
> 

That’ll depend on the details of the micro architecture. I don’t know what is typical.

> I have to look at the HW manuals again, but I don't think x86-64 has that capability.
> 
> 
> On Thu, Jun 26, 2014 at 11:23 AM, James Molloy <james.molloy at arm.com> wrote:
> Hi Sanjay,
> 
>  
> 
> The behaviour I’m talking about I’ve actually pinned down to CodeGenPrepare not working too well with ISA’s that don’t have a good scaled load. I have a patch to fix it that is going through performance testing now.
> 
>  
> 
> Your testcase seems specific to x86 – for aarch64 we get the rather spiffy:
> 
>  
> 
> _Z3fooPii:                              // @_Z3fooPii
> 
> // BB#0:                                // %entry
> 
>                 add        w8, w1, #1              // =1
> 
>                 add        w9, w1, #2              // =2
> 
>                 ldr           w8, [x0, w8, sxtw #2]
> 
>                 ldr           w9, [x0, w9, sxtw #2]
> 
>                 add        w8, w9, w8
> 
>                 str           w8, [x0, w1, sxtw #2]
> 
>                 ret
> 
>  
> 
> The sext can be matched as part of the addressing mode for AArch64 – maybe it’s something in codegenprepare for x86 going awry?
> 
>  
> 
> Cheers,
> 
>  
> 
> James
> 
>  
> 
> From: Sanjay Patel [mailto:spatel at rotateright.com] 
> Sent: 26 June 2014 18:11
> To: Manjunath DN
> Cc: James Molloy; llvmdev at cs.uiuc.edu
> 
> 
> Subject: Re: [LLVMdev] Contributing the Apple ARM64 compiler backend
> 
>  
> 
> >> We've also seen similar instances where multiple registers are used to compute very similar
> >> addresses (such as x+0 and x+4!) and this increases register pressure.
> 
> I don't have an ARM enabled build of the tools to test with, but I suspect what I'm seeing here:
> http://llvm.org/bugs/show_bug.cgi?id=20134
> 
>  
> 
> ...would also be bad on AArch64.
> 
>  
> 
> On Wed, Jun 25, 2014 at 8:58 PM, Manjunath DN <manjunath.dn at gmail.com> wrote:
> 
> HI James,
> 
> Thanks for your reply and hints on what can be done for the Aarch64 backend optimization for llvm
> 
> We have SPEC license and v8 hardware. So I will start looking into it
> 
> warm regards
> 
> Manjunath
> 
>  
> 
>  
> 
> On Wed, Jun 25, 2014 at 8:42 PM, James Molloy <james.molloy at arm.com> wrote:
> 
> Hi Manjunath,
> 
> At the time of writing that status we had only done our initial analysis.
> This was done without real hardware and attempted to identify poor code
> sequences but we were unable to quantify how much effect this would actually
> have.
> 
> Since then we've done more analysis using Cortex-A57 and Cortex-A53 on an
> internal development platform.
> 
> For SPEC, we are between 10% and 0% behind GCC on 9 benchmarks, and 25%
> ahead on one benchmark. Most benchmarks are less than 5% behind GCC.
> 
> Because of the licencing of SPEC, I have to be quite restricted in what I
> say and I can't give any numbers - sorry about that.
> 
> We are focussing on Cortex-A57, and the things we've identified so far are:
>   * The CSEL instruction behaves worse than the equivalent branch structure
> in at least one benchmark. In an out of order core, select-like instructions
> are going to be slower than their branched equivalent if the branch is
> predictable due to CSEL having two dependencies.
> 
>   * Redundant calculations inside if conditions. We've seen:
>     1. "if (a[x].b < c[y].d || a[x].e > c[y].f)" - the calculations of a[x]
> and c[y] are repeated, when they are common. We've also seen similar
> instances where multiple registers are used to compute very similar
> addresses (such as x+0 and x+4!) and this increases register pressure.
>     2. "if (a < 0 && b == c || a > 0 && b == d)" - the first comparison of
> 'a' against zero is done twice, when the flag results of the first
> comparison could be used for the second comparison.
> 
>   * For a loop such as "for (i = 0; i < n; ++i)
> {do_something_with(&x[i]);}", GCC is using &x[i] as the loop induction
> variable where LLVM uses i and performs the calculation &x[i] on every
> iteration. This only creates one more add instruction but the loop we see it
> in only has 5 or so instructions.
> 
>   * The inline heuristics are way behind GCC's. If we crank the inline
> threshold up to 1000, we can remove a 6.5% performance regression from one
> benchmark entirely.
> 
>   * We're generating (due to SLP vectorizer and a DAG combine) loads into Q
> registers when merging consecutive loads. This is bad, because there are no
> callee-saved Q registers! So if the live range crosses a function call, it
> will have to be immediately spilled again.  This can be easily fixed by
> using load-pair instructions instead. I have a patch to fix this.
> 
> The list above is non-exhaustive and only contains things that we think may
> affect multiple benchmarks or real-world code.
> 
> I've also noticed:
>   * Our inline memcpy expansion pass is emitting "LDR q0, [..]; STR q0,
> [..]" pairs, which is less than ideal on A53. If we switched to emitting
> "LDP x0, x1, [..]; STP x0, x1, [..]", we'd get around 30% better inline
> memcpy performance on A53. A57 seems to deal well with the LDR q sequence.
> 
> I'm sorry I'm unable to provide code samples for most of the issues found so
> far - this is an artefact of them having come from SPEC. Trivial examples do
> not always show the same behaviour, and as we're still investigating we
> haven't yet been able to reduce most of these to an anonymisable testcase.
> 
> Hope this helps, but doubt it does,
> 
> James
> 
> > -----Original Message-----
> > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
> > Behalf Of Manjunath N
> > Sent: 24 June 2014 10:45
> > To: llvmdev at cs.uiuc.edu
> > Subject: Re: [LLVMdev] Contributing the Apple ARM64 compiler backend
> >
> >
> >
> > Eric Christopher <echristo <at> gmail.com> writes:
> >
> > >
> > > > The big pain issues I see merging from ARM64 to AArch64 are:
> > > > 1.      Apple have created a fairly complete scheduling model already
> > for
> > > > ARM64, and we'd have to merge the partial? model in AArch64 and
> > theirs.
> > We
> > > > risk regressing performance on Apple's targets here, and we can't
> > determine
> > > > ourselves whether we have or not. This is not ideal.
> > > > 2.      Porting over the DAG-to-DAG optimizations and any other
> > > > optimizations that rely on the tablegen layout will be very tricky.
> > > > 3.      The conditional compare pass is fairly comprehensive - we'd
> > have
> > to
> > > > port that over or rewrite it and that would be a lot of work.
> > > > 4.      A very quick analysis last night indicated that ARM64 has
> > > > implemented just under half of the optimizations we discovered
> > opportunities
> > > > for in SPEC and EEMBC. That's a fairly comprehensive number of
> > > > optimizations, and they won't all be easy to port.
> > Eric,
> > You mention that there a quite a few  optimization opportunities in SPEC
> > 2000/ EEMBC.
> > I am looking to optimize the Aarch64 backend. Could you please let me know
> > the big optimizations possible?
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> =========================================
> warm regards,
> Manjunath DN
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
> 
> -- 
> Sanjay Patel
> RotateRight, LLC
> http://www.rotateright.com
> 
> 
> 
> 
> -- 
> Sanjay Patel
> RotateRight, LLC
> http://www.rotateright.com
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140627/e661efba/attachment.html>