[PATCH][AArch64] Prefer ldp x, x to ldr q
Arnold Schwaighofer
aschwaighofer at apple.com
Mon Aug 4 11:34:18 PDT 2014
> On Aug 4, 2014, at 6:38 AM, James Molloy <James.Molloy at arm.com> wrote:
>
> Hi Tim,
>
> [cc. Arnold as this affects SLP vectorizer rather than just the cost model]
>
> The attached patch attempts to fix this in a non-hacky way.
>
> The intent is to add explicit modelling of the costs involved in keeping values live over a callsite. The patch causes the SLP Vectorizer to scan its generated tree bottom-up, keeping track of all values live. When it encounters a call instruction (that is not part of the tree), it calls out to a new TTI hook.
>
> Most architectures will use the NoAA version of this hook which just returns zero cost, but AArch64 returns the cost of a spill and fill if a 128-bit vector type is used.
>
> This algorithm is conservative and may not catch all cases. For example:
>
> A:
> X = load ...
> Goto B
> B:
> Call ...
> Goto C
> C:
> Store X
>
> Because there are no instructions within the SLP tree in block B, it will not see the call instruction. This is a limitation due to the difficulty of finding the "right path" from block C to block A without any helping information. In practice I don't see this as a large limitation - a conservative heuristic is still better than no heuristic (or a badly-modelled heuristic).
I think this is okay. This looks good to me.
Could we use smallptrset and smallvector instead of the stl equivalents. I think our sets are going to be small in many cases.
More information about the llvm-commits
mailing list