[PATCH] ARM NEON: Handle v16i8 and v8i16 reverse shuffles
Nadav Rotem
nrotem at apple.com
Sat Feb 9 12:43:33 PST 2013
+/// If the mask is a reverse operation on an v16i8/v8i16 operation we can
+/// implement it in terms of a vrev64 and vext operation.
+/// \return true if this is a reverse operation on an v16i8/v8i16 vector.
+static bool isReverse_v16i8_or_v8i16_Mask(ArrayRef<int> M, EVT VT) {
+ if (VT != MVT::v16i8 && VT != MVT::v8i16)
+ return false;
+
+ unsigned NumElts = VT.getVectorNumElements();
+ // Make sure the mask has the right size.
+ if (NumElts != M.size())
+ return false;
+
+ // Look for <15, ..., 3, -1, 1, 0>.
+ for (unsigned i = 0; i != NumElts; ++i)
+ if (M[i] >= 0 && M[i] != (int) (NumElts - 1 - i))
+ return false;
+
+ return true;
+}
In the future may want to add support for additional types (such as floats). Can we make this function generic and check for the types outside the function ?
+ if (isReverse_v16i8_or_v8i16_Mask(ShuffleMask, VT))
+ return LowerReverse_VECTOR_SHUFFLEv16i8_v8i16(Op, DAG);
+
Can become:
+ if (isReverseMask(ShuffleMask, VT) && (VT == MVT::v16i8 || VT == MVT::v8i16 ))
+ return LowerReverse_VECTOR_SHUFFLEv16i8_v8i16(Op, DAG);
+
Thanks,
Nadav
On Feb 9, 2013, at 11:02 AM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> Lower reverse shuffles to a vrev64 and a vext instruction instead of the default
> legalization of storing and loading to the stack. This is important because we
> generate reverse shuffles in the loop vectorizer when we reverse store to an
> array.
>
> uint8_t Arr[N];
> for (i = 0; i < N; ++i)
> Arr[N - i - 1] = …
>
> For v8i16 we now generate something like:
>
> vrev64.16 q9, q9
> vext.16 q9, q9, q9, #4
>
> instead of:
>
> orr r1, r0, #14
> vst1.16 {d16[0]}, [r1, :16]
> orr r1, r0, #12
> vst1.16 {d16[1]}, [r1, :16]
> orr r1, r0, #10
> vst1.16 {d16[2]}, [r1, :16]
> orr r1, r0, #8
> vst1.16 {d16[3]}, [r1, :16]
> orr r1, r0, #6
> vst1.16 {d17[0]}, [r1, :16]
> orr r1, r0, #4
> vst1.16 {d17[1]}, [r1, :16]
> orr r1, r0, #2
> vst1.16 {d17[2]}, [r1, :16]
> vst1.16 {d17[3]}, [r0, :16]
> vld1.64 {d16, d17}, [r0, :128]
>
>
> For v16i8 we now generate something like:
>
> vrev64.8 q8, q8
> vext.8 q8, q8, q8, #8
>
> <0001-ARM-NEON-Handle-v16i8-and-v8i16-reverse-shuffles.patch>
>
More information about the llvm-commits
mailing list