<div dir="ltr"><div>HI James,</div><div>Thanks for your reply and hints on what can be done for the Aarch64 backend optimization for llvm</div><div>We have SPEC license and v8 hardware. So I will start looking into it</div>

<div>warm regards</div><div>Manjunath</div><div> </div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jun 25, 2014 at 8:42 PM, James Molloy <span dir="ltr"><<a href="mailto:james.molloy@arm.com" target="_blank">james.molloy@arm.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Manjunath,<br>

<br>

At the time of writing that status we had only done our initial analysis.<br>

This was done without real hardware and attempted to identify poor code<br>

sequences but we were unable to quantify how much effect this would actually<br>

have.<br>

<br>

Since then we've done more analysis using Cortex-A57 and Cortex-A53 on an<br>

internal development platform.<br>

<br>

For SPEC, we are between 10% and 0% behind GCC on 9 benchmarks, and 25%<br>

ahead on one benchmark. Most benchmarks are less than 5% behind GCC.<br>

<br>

Because of the licencing of SPEC, I have to be quite restricted in what I<br>

say and I can't give any numbers - sorry about that.<br>

<br>

We are focussing on Cortex-A57, and the things we've identified so far are:<br>

  * The CSEL instruction behaves worse than the equivalent branch structure<br>

in at least one benchmark. In an out of order core, select-like instructions<br>

are going to be slower than their branched equivalent if the branch is<br>

predictable due to CSEL having two dependencies.<br>

<br>

  * Redundant calculations inside if conditions. We've seen:<br>

    1. "if (a[x].b < c[y].d || a[x].e > c[y].f)" - the calculations of a[x]<br>

and c[y] are repeated, when they are common. We've also seen similar<br>

instances where multiple registers are used to compute very similar<br>

addresses (such as x+0 and x+4!) and this increases register pressure.<br>

    2. "if (a < 0 && b == c || a > 0 && b == d)" - the first comparison of<br>

'a' against zero is done twice, when the flag results of the first<br>

comparison could be used for the second comparison.<br>

<br>

  * For a loop such as "for (i = 0; i < n; ++i)<br>

{do_something_with(&x[i]);}", GCC is using &x[i] as the loop induction<br>

variable where LLVM uses i and performs the calculation &x[i] on every<br>

iteration. This only creates one more add instruction but the loop we see it<br>

in only has 5 or so instructions.<br>

<br>

  * The inline heuristics are way behind GCC's. If we crank the inline<br>

threshold up to 1000, we can remove a 6.5% performance regression from one<br>

benchmark entirely.<br>

<br>

  * We're generating (due to SLP vectorizer and a DAG combine) loads into Q<br>

registers when merging consecutive loads. This is bad, because there are no<br>

callee-saved Q registers! So if the live range crosses a function call, it<br>

will have to be immediately spilled again.  This can be easily fixed by<br>

using load-pair instructions instead. I have a patch to fix this.<br>

<br>

The list above is non-exhaustive and only contains things that we think may<br>

affect multiple benchmarks or real-world code.<br>

<br>

I've also noticed:<br>

  * Our inline memcpy expansion pass is emitting "LDR q0, [..]; STR q0,<br>

[..]" pairs, which is less than ideal on A53. If we switched to emitting<br>

"LDP x0, x1, [..]; STP x0, x1, [..]", we'd get around 30% better inline<br>

memcpy performance on A53. A57 seems to deal well with the LDR q sequence.<br>

<br>

I'm sorry I'm unable to provide code samples for most of the issues found so<br>

far - this is an artefact of them having come from SPEC. Trivial examples do<br>

not always show the same behaviour, and as we're still investigating we<br>

haven't yet been able to reduce most of these to an anonymisable testcase.<br>

<br>

Hope this helps, but doubt it does,<br>

<br>

James<br>

<br>

> -----Original Message-----<br>

> From: <a href="mailto:llvmdev-bounces@cs.uiuc.edu">llvmdev-bounces@cs.uiuc.edu</a> [mailto:<a href="mailto:llvmdev-bounces@cs.uiuc.edu">llvmdev-bounces@cs.uiuc.edu</a>] On<br>

> Behalf Of Manjunath N<br>

> Sent: 24 June 2014 10:45<br>

> To: <a href="mailto:llvmdev@cs.uiuc.edu">llvmdev@cs.uiuc.edu</a><br>

> Subject: Re: [LLVMdev] Contributing the Apple ARM64 compiler backend<br>

><br>

><br>

><br>

> Eric Christopher <echristo <at> <a href="http://gmail.com" target="_blank">gmail.com</a>> writes:<br>

><br>

> ><br>

> > > The big pain issues I see merging from ARM64 to AArch64 are:<br>

> > > 1.      Apple have created a fairly complete scheduling model already<br>

> for<br>

> > > ARM64, and we'd have to merge the partial? model in AArch64 and<br>

> theirs.<br>

> We<br>

> > > risk regressing performance on Apple's targets here, and we can't<br>

> determine<br>

> > > ourselves whether we have or not. This is not ideal.<br>

> > > 2.      Porting over the DAG-to-DAG optimizations and any other<br>

> > > optimizations that rely on the tablegen layout will be very tricky.<br>

> > > 3.      The conditional compare pass is fairly comprehensive - we'd<br>

> have<br>

> to<br>

> > > port that over or rewrite it and that would be a lot of work.<br>

> > > 4.      A very quick analysis last night indicated that ARM64 has<br>

> > > implemented just under half of the optimizations we discovered<br>

> opportunities<br>

> > > for in SPEC and EEMBC. That's a fairly comprehensive number of<br>

> > > optimizations, and they won't all be easy to port.<br>

> Eric,<br>

> You mention that there a quite a few  optimization opportunities in SPEC<br>

> 2000/ EEMBC.<br>

> I am looking to optimize the Aarch64 backend. Could you please let me know<br>

> the big optimizations possible?<br>

><br>

><br>

><br>

> _______________________________________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

<br>

<br>

<br>

<br>

</blockquote></div><br><br clear="all"><br>-- <br><div>=========================================<br>warm regards,<br>Manjunath DN<br></div>

</div>