[PATCH] AArch64: big endian constant vector pools

Fri Apr 11 11:01:06 PDT 2014

>     define void @foo( <4 x i64>* %loadaddr, <4 x i32>* %storeaddr ) {
>       %1 = load <4 x i64>* %loadaddr
>       %2 = add <4 x i64> %1, <i64 1, i64 2, i64 3, i64 4>
>       %3 = trunc <4 x i64> %2 to <4 x i32>
>       store <4 x i32> %3, <4 x i32>* %storeaddr
>       ret void
>     }
>
>   Note that the const value is loaded with ldr q and used by vector instruction, thus swapping of the two upper and the two lower  vector elements.

I see. I was worried there would be things like this: the remapping
SDNodeXForm I gave takes care of (most) explicit vector indices, but
there are still a whole bunch of implicit ones too. In this case,
concat_vectors, and it has an even simpler problematic example:

define <4 x i32> @foo(<2 x i32> %l, <2 x i32> %r) {
  %res = shufflevector <2 x i32> %l, <2 x i32> %r, <4 x i32> <i32 0,
i32 1, i32 2, i32 3>
  ret <4 x i32> %res
}

All our patterns assume that the LHS of a concat goes in lanes
[0..N-1] and the RHS goes in [N..2N-1]. This is no longer correct in
the LDR-based big-endian world. We probably have to add to the list:
extract_subvector and scalar_to_vector. I can't *think* of any others,
but that doesn't mean they don't exist. Almost certainly any
instruction ending in a "2" is going to be affected.

Now, onto solutions...

1. It's not necessarily too late to switch to canonicalising on
ld1/st1 for the in-register representation. As before, the trade-offs
are: bitcast is odd; function calls & returns are odd; vector indices
are normal though. Probably less efficient code overall. Overall,
implementation is probably quite a lot simpler.

2. We could obviously split every pattern involving the problematic
nodes into IsLE and IsBE variants. That's the sledge-hammer solution,
and is very ugly.

3. We could create a new AArch64ISD::CONCAT_LOHI or something during
ISelLowering that reversed it in the BE cases. This probably has nasty
implications for DAGCombiner so it's not really a good idea.

4. Swapping the arguments to all concat_vectors during ISelLowering is
probably a very bad idea, but I'll mention it for completeness. The
DAGCombiner will not be expecting that switch to have happened and
will almost certainly get it wrong (e.g. by combining
"(extract_element (concat_vectors L, R), 0)" to "(extract_element L,
0)").

I can't think of any way to make this problem go away with clever
patterns at the moment.

Cheers.

Tim.