[PATCH][AArch64] Prefer ldp x, x to ldr q

Mon Jul 28 09:12:32 PDT 2014

> Also, I believe you used Tim's old address.  Forwarding to Tim's current
> address.

He'd already done that. I'm just not quite sure it's the obviously
right thing to do in all cases.

The DAG one is reasonably convincing on its own (as James says, it's
close enough to why hasPairedLoad exists). I'd second a LGTM on that
one, in fact.

The TTI one, though, looks iffy. For a start it only covers 64-bit
element types, while the question seems like it'd be relevant to
fusing vectors regardless of source.

The cost also doesn't seem to match what's really going on. At least
on Cyclone, cost(ldr qD) == cost(ldp dD1, dD2) == cost(ldr dD1; ldr
dD2)/2 (approximately). So it's not that loading a 128-bit value is
particularly expensive (which might skew other comparisons where it's
used), but that there's special dispensation for pairs of 64-bit
values. I don't know about other cores, but James's initial comments
suggest it might be similar for some he knows about.

Cheers.

Tim.