[PATCH][AArch64] Prefer ldp x, x to ldr q
James Molloy
james.molloy at arm.com
Fri Aug 1 09:52:29 PDT 2014
Hi Tim,
"all heuristics are wrong, but some are useful"? ;)
I just extended it to work with all 128-bit loads, and that caused some bad behaviour.
What we're saying is (assume one scalar load has a cost of 1):
<2 x i32> costs 1
<4 x i32> costs 4
That gives a punitive cost for <4 x i32> which is incorrect.
The only thing we're attempting to say is "there is no advantage to <2 x i64> over 2 i64 loads". The VF of 2 is important, as only a VF of 2 can be lowered with an LDP instruction. On AArch32 we could have packed values together and loaded <4 x i32> with a VLDRD, but of course that doesn't work on AArch64.
So I think it only applies to <2 x i64> or <2 x double>. And yes, this whole thing is making me feel very dirty inside - if there's a better way, I don't know of it :(
Cheers,
James
-----Original Message-----
From: Tim Northover [mailto:t.p.northover at gmail.com]
Sent: 29 July 2014 13:31
To: James Molloy
Cc: Chad Rosier; Tim Northover; llvm-commits
Subject: Re: [PATCH][AArch64] Prefer ldp x, x to ldr q
> While this is a slight fudge, I don't see it as a hack personally.
A "heuristic" then (scare-quotes optional)? That covers a multitude of sins.
Personally, I'd still say a hack, but that the entire function is a hack once you start thinking about ldp so nothing else is possible without reworking.
So I reckon the approach is OK for now, once it's made more generic (i.e. it should probably return the same thing for all 128-bit ops).
Cheers.
Tim.
More information about the llvm-commits
mailing list