[PATCH][AArch64] Prefer ldp x, x to ldr q

Fri Aug 1 10:03:18 PDT 2014

Hi James,

> I just extended it to work with all 128-bit loads, and that caused some bad behaviour.
> 
> What we're saying is (assume one scalar load has a cost of 1):
> <2 x i32> costs 1
> <4 x i32> costs 4

How did you extend it? I’d expect you to return 2 for <4 x i32>, along some kind of 2*<64-bit cost> algorithm, rather than N*<scalar>, if we’re really pretending to model some ldp effect.

> So I think it only applies to <2 x i64> or <2 x double>. And yes, this whole thing is making me feel very dirty inside - if there's a better way, I don't know of it :(

I think if we really can only make it apply to the 64-bit element case, that’s the strongest evidence yet that the whole approach is wrong. How can it be OK to merge loads to form a <4 x i32> ldr, but not a <2 x i64> one? They’re exactly the same instruction.

Cheers.

Tim.