[PATCH] AArch64: big endian constant vector pools

Sun Apr 13 20:48:59 PDT 2014

Hi Tim,

I thought we had agreement at
http://permalink.gmane.org/gmane.comp.compilers.llvm.cvs/180464.

So now I'm confused and don't understand why solution 1 can be an option
for big-endian?

Are you changing your mind and arguing the consistent in-register data
layout for big-endian and little-endian? I'm personally happy with that,
but it will not work together with GCC, and is violating aapcs64, isn't it?

Thanks,
-Jiangning

2014-04-12 2:01 GMT+08:00 Tim Northover <t.p.northover at gmail.com>:

> >     define void @foo( <4 x i64>* %loadaddr, <4 x i32>* %storeaddr ) {
> >       %1 = load <4 x i64>* %loadaddr
> >       %2 = add <4 x i64> %1, <i64 1, i64 2, i64 3, i64 4>
> >       %3 = trunc <4 x i64> %2 to <4 x i32>
> >       store <4 x i32> %3, <4 x i32>* %storeaddr
> >       ret void
> >     }
> >
> >   Note that the const value is loaded with ldr q and used by vector
> instruction, thus swapping of the two upper and the two lower  vector
> elements.
>
> I see. I was worried there would be things like this: the remapping
> SDNodeXForm I gave takes care of (most) explicit vector indices, but
> there are still a whole bunch of implicit ones too. In this case,
> concat_vectors, and it has an even simpler problematic example:
>
> define <4 x i32> @foo(<2 x i32> %l, <2 x i32> %r) {
>   %res = shufflevector <2 x i32> %l, <2 x i32> %r, <4 x i32> <i32 0,
> i32 1, i32 2, i32 3>
>   ret <4 x i32> %res
> }
>
> All our patterns assume that the LHS of a concat goes in lanes
> [0..N-1] and the RHS goes in [N..2N-1]. This is no longer correct in
> the LDR-based big-endian world. We probably have to add to the list:
> extract_subvector and scalar_to_vector. I can't *think* of any others,
> but that doesn't mean they don't exist. Almost certainly any
> instruction ending in a "2" is going to be affected.
>
> Now, onto solutions...
>
> 1. It's not necessarily too late to switch to canonicalising on
> ld1/st1 for the in-register representation. As before, the trade-offs
> are: bitcast is odd; function calls & returns are odd; vector indices
> are normal though. Probably less efficient code overall. Overall,
> implementation is probably quite a lot simpler.
>
> 2. We could obviously split every pattern involving the problematic
> nodes into IsLE and IsBE variants. That's the sledge-hammer solution,
> and is very ugly.
>
> 3. We could create a new AArch64ISD::CONCAT_LOHI or something during
> ISelLowering that reversed it in the BE cases. This probably has nasty
> implications for DAGCombiner so it's not really a good idea.
>
> 4. Swapping the arguments to all concat_vectors during ISelLowering is
> probably a very bad idea, but I'll mention it for completeness. The
> DAGCombiner will not be expecting that switch to have happened and
> will almost certainly get it wrong (e.g. by combining
> "(extract_element (concat_vectors L, R), 0)" to "(extract_element L,
> 0)").
>
> I can't think of any way to make this problem go away with clever
> patterns at the moment.
>
> Cheers.
>
> Tim.
>

-- 
Thanks,
-Jiangning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140414/481a5b14/attachment.html>