[LLVMdev] Contributing the Apple ARM64 compiler backend

Thu Jun 26 10:10:38 PDT 2014

>> We've also seen similar instances where multiple registers are used to
compute very similar
>> addresses (such as x+0 and x+4!) and this increases register pressure.

I don't have an ARM enabled build of the tools to test with, but I suspect
what I'm seeing here:
http://llvm.org/bugs/show_bug.cgi?id=20134

...would also be bad on AArch64.

On Wed, Jun 25, 2014 at 8:58 PM, Manjunath DN <manjunath.dn at gmail.com>
wrote:

> HI James,
> Thanks for your reply and hints on what can be done for the Aarch64
> backend optimization for llvm
> We have SPEC license and v8 hardware. So I will start looking into it
> warm regards
> Manjunath
>
>
>
> On Wed, Jun 25, 2014 at 8:42 PM, James Molloy <james.molloy at arm.com>
> wrote:
>
>> Hi Manjunath,
>>
>> At the time of writing that status we had only done our initial analysis.
>> This was done without real hardware and attempted to identify poor code
>> sequences but we were unable to quantify how much effect this would
>> actually
>> have.
>>
>> Since then we've done more analysis using Cortex-A57 and Cortex-A53 on an
>> internal development platform.
>>
>> For SPEC, we are between 10% and 0% behind GCC on 9 benchmarks, and 25%
>> ahead on one benchmark. Most benchmarks are less than 5% behind GCC.
>>
>> Because of the licencing of SPEC, I have to be quite restricted in what I
>> say and I can't give any numbers - sorry about that.
>>
>> We are focussing on Cortex-A57, and the things we've identified so far
>> are:
>>   * The CSEL instruction behaves worse than the equivalent branch
>> structure
>> in at least one benchmark. In an out of order core, select-like
>> instructions
>> are going to be slower than their branched equivalent if the branch is
>> predictable due to CSEL having two dependencies.
>>
>>   * Redundant calculations inside if conditions. We've seen:
>>     1. "if (a[x].b < c[y].d || a[x].e > c[y].f)" - the calculations of
>> a[x]
>> and c[y] are repeated, when they are common. We've also seen similar
>> instances where multiple registers are used to compute very similar
>> addresses (such as x+0 and x+4!) and this increases register pressure.
>>     2. "if (a < 0 && b == c || a > 0 && b == d)" - the first comparison of
>> 'a' against zero is done twice, when the flag results of the first
>> comparison could be used for the second comparison.
>>
>>   * For a loop such as "for (i = 0; i < n; ++i)
>> {do_something_with(&x[i]);}", GCC is using &x[i] as the loop induction
>> variable where LLVM uses i and performs the calculation &x[i] on every
>> iteration. This only creates one more add instruction but the loop we see
>> it
>> in only has 5 or so instructions.
>>
>>   * The inline heuristics are way behind GCC's. If we crank the inline
>> threshold up to 1000, we can remove a 6.5% performance regression from one
>> benchmark entirely.
>>
>>   * We're generating (due to SLP vectorizer and a DAG combine) loads into
>> Q
>> registers when merging consecutive loads. This is bad, because there are
>> no
>> callee-saved Q registers! So if the live range crosses a function call, it
>> will have to be immediately spilled again.  This can be easily fixed by
>> using load-pair instructions instead. I have a patch to fix this.
>>
>> The list above is non-exhaustive and only contains things that we think
>> may
>> affect multiple benchmarks or real-world code.
>>
>> I've also noticed:
>>   * Our inline memcpy expansion pass is emitting "LDR q0, [..]; STR q0,
>> [..]" pairs, which is less than ideal on A53. If we switched to emitting
>> "LDP x0, x1, [..]; STP x0, x1, [..]", we'd get around 30% better inline
>> memcpy performance on A53. A57 seems to deal well with the LDR q sequence.
>>
>> I'm sorry I'm unable to provide code samples for most of the issues found
>> so
>> far - this is an artefact of them having come from SPEC. Trivial examples
>> do
>> not always show the same behaviour, and as we're still investigating we
>> haven't yet been able to reduce most of these to an anonymisable testcase.
>>
>> Hope this helps, but doubt it does,
>>
>> James
>>
>> > -----Original Message-----
>> > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
>> On
>> > Behalf Of Manjunath N
>> > Sent: 24 June 2014 10:45
>> > To: llvmdev at cs.uiuc.edu
>> > Subject: Re: [LLVMdev] Contributing the Apple ARM64 compiler backend
>> >
>> >
>> >
>> > Eric Christopher <echristo <at> gmail.com> writes:
>> >
>> > >
>> > > > The big pain issues I see merging from ARM64 to AArch64 are:
>> > > > 1.      Apple have created a fairly complete scheduling model
>> already
>> > for
>> > > > ARM64, and we'd have to merge the partial? model in AArch64 and
>> > theirs.
>> > We
>> > > > risk regressing performance on Apple's targets here, and we can't
>> > determine
>> > > > ourselves whether we have or not. This is not ideal.
>> > > > 2.      Porting over the DAG-to-DAG optimizations and any other
>> > > > optimizations that rely on the tablegen layout will be very tricky.
>> > > > 3.      The conditional compare pass is fairly comprehensive - we'd
>> > have
>> > to
>> > > > port that over or rewrite it and that would be a lot of work.
>> > > > 4.      A very quick analysis last night indicated that ARM64 has
>> > > > implemented just under half of the optimizations we discovered
>> > opportunities
>> > > > for in SPEC and EEMBC. That's a fairly comprehensive number of
>> > > > optimizations, and they won't all be easy to port.
>> > Eric,
>> > You mention that there a quite a few  optimization opportunities in SPEC
>> > 2000/ EEMBC.
>> > I am looking to optimize the Aarch64 backend. Could you please let me
>> know
>> > the big optimizations possible?
>> >
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>>
>>
>
>
> --
> =========================================
> warm regards,
> Manjunath DN
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-- 
Sanjay Patel
RotateRight, LLC
http://www.rotateright.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140626/481a77b2/attachment.html>