[PATCH][AArch64] Prefer ldp x, x to ldr q

Mon Aug 4 11:34:18 PDT 2014

> On Aug 4, 2014, at 6:38 AM, James Molloy <James.Molloy at arm.com> wrote:
> 
> Hi Tim,
> 
> [cc. Arnold as this affects SLP vectorizer rather than just the cost model]
> 
> The attached patch attempts to fix this in a non-hacky way.
> 
> The intent is to add explicit modelling of the costs involved in keeping values live over a callsite. The patch causes the SLP Vectorizer to scan its generated tree bottom-up, keeping track of all values live. When it encounters a call instruction (that is not part of the tree), it calls out to a new TTI hook.
> 
> Most architectures will use the NoAA version of this hook which just returns zero cost, but AArch64 returns the cost of a spill and fill if a 128-bit vector type is used.
> 
> This algorithm is conservative and may not catch all cases. For example:
> 
> A:
>  X = load ...
>  Goto B
> B:
>  Call ...
>  Goto C
> C:
>  Store X
> 
> Because there are no instructions within the SLP tree in block B, it will not see the call instruction. This is a limitation due to the difficulty of finding the "right path" from block C to block A without any helping information. In practice I don't see this as a large limitation - a conservative heuristic is still better than no heuristic (or a badly-modelled heuristic).

I think this is okay. This looks good to me.

Could we use smallptrset and smallvector instead of the stl equivalents. I think our sets are going to be small in many cases.