[llvm-dev] [AArch64] Address computation folding

Wed Nov 11 13:15:09 PST 2015

Hi,

Indeed, the complex add is more expensive on all Cortex cores I know of.

However there is an important point here that the code sequence we generate
requires two registers live instead of one. In high regpressure loops, were
probably losing performance.

James
On Wed, 11 Nov 2015 at 21:09, Tim Northover via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On 11 November 2015 at 11:57, Meador Inge <meadori at gmail.com> wrote:
> > Why wouldn't it consider the number of uses in any operation?  The
> > "expected" code is easy to get by checking the number of uses.  This
> > may be desirable on some micro-architectures depending on the cost of
> > the various loads and stores.
>
> As you say, very microarchitecture-dependent. The code produced is
> probably optimal for Cyclone ("[x0, x8]" is no more expensive than
> "[x8]" and the "lsl" is slightly cheaper than the complicated "add").
> If I'm reading the Cortex-A57 optimisation guide correctly, the same
> reasoning applies there too.
>
> Cheers.
>
> Tim.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/93848503/attachment.html>