[llvm-dev] [AArch64] Address computation folding

Wed Nov 11 14:44:51 PST 2015

On Wed, Nov 11, 2015 at 3:08 PM, Tim Northover <t.p.northover at gmail.com> wrote:

> As you say, very microarchitecture-dependent. The code produced is
> probably optimal for Cyclone ("[x0, x8]" is no more expensive than
> "[x8]" and the "lsl" is slightly cheaper than the complicated "add").
> If I'm reading the Cortex-A57 optimisation guide correctly, the same
> reasoning applies there too.

Yeah, my reading is the same.  For Cortex-A57 it looks like the same
number of u-ops and latency either way (since LDR [x1, x2] is free).

-- Meador