[PATCH] ARM NEON: Handle v16i8 and v8i16 reverse shuffles

Sat Feb 9 11:31:31 PST 2013

Looks good.

On 9 February 2013 19:02, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:

> Lower reverse shuffles to a vrev64 and a vext instruction instead of the
> default
> legalization of storing and loading to the stack. This is important
> because we
> generate reverse shuffles in the loop vectorizer when we reverse store to
> an
> array.
>
>   uint8_t Arr[N];
>   for (i = 0; i < N; ++i)
>     Arr[N - i - 1] = …
>
> For v8i16 we now generate something like:
>
>   vrev64.16       q9, q9
>   vext.16 q9, q9, q9, #4
>
> instead of:
>
>   orr r1, r0, #14
>   vst1.16 {d16[0]}, [r1, :16]
>   orr r1, r0, #12
>   vst1.16 {d16[1]}, [r1, :16]
>   orr r1, r0, #10
>   vst1.16 {d16[2]}, [r1, :16]
>   orr r1, r0, #8
>   vst1.16 {d16[3]}, [r1, :16]
>   orr r1, r0, #6
>   vst1.16 {d17[0]}, [r1, :16]
>   orr r1, r0, #4
>   vst1.16 {d17[1]}, [r1, :16]
>   orr r1, r0, #2
>   vst1.16 {d17[2]}, [r1, :16]
>   vst1.16 {d17[3]}, [r0, :16]
>   vld1.64 {d16, d17}, [r0, :128]
>
>
> For v16i8 we now generate something like:
>
>   vrev64.8        q8, q8
>   vext.8  q8, q8, q8, #8
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130209/983ad8f6/attachment.html>