[PATCH] ARM NEON: Handle v16i8 and v8i16 reverse shuffles

Sat Feb 9 13:14:42 PST 2013

Updated patch.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ARM-NEON-Handle-v16i8-and-v8i16-reverse-shuffles.patch
Type: application/octet-stream
Size: 5064 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130209/c87255ff/attachment.obj>
-------------- next part --------------

On Feb 9, 2013, at 2:43 PM, Nadav Rotem <nrotem at apple.com> wrote:

> +/// If the mask is a reverse operation on an v16i8/v8i16 operation we can
> +/// implement it in terms of a vrev64 and vext operation.
> +/// \return true if this is a reverse operation on an v16i8/v8i16 vector.
> +static bool isReverse_v16i8_or_v8i16_Mask(ArrayRef<int> M, EVT VT) {
> +  if (VT != MVT::v16i8 && VT != MVT::v8i16)
> +      return false;
> +
> +  unsigned NumElts = VT.getVectorNumElements();
> +  // Make sure the mask has the right size.
> +  if (NumElts != M.size())
> +      return false;
> +
> +  // Look for <15, ..., 3, -1, 1, 0>.
> +  for (unsigned i = 0; i != NumElts; ++i)
> +    if (M[i] >= 0 && M[i] != (int) (NumElts - 1 - i))
> +      return false;
> +
> +  return true;
> +}
> 
> In the future may want to add support for additional types (such as floats). Can we make this function generic and check for the types outside the function ? 

We already have code that handles those types. If it can be directly supported using vrev we have isVREVMask. If it can be supported using a combination of shuffles we use a shuffle table. This shuffle table only supports up to 4 vector elements (we probably did not want it to get top huge, it is 26K right now) which is why we currently don't handle v16i8/v8i16.

> +  if (isReverse_v16i8_or_v8i16_Mask(ShuffleMask, VT))
> +    return LowerReverse_VECTOR_SHUFFLEv16i8_v8i16(Op, DAG);
> +
> 
> Can become:
> 
> +  if (isReverseMask(ShuffleMask, VT)  && (VT == MVT::v16i8 || VT == MVT::v8i16 ))
> +    return LowerReverse_VECTOR_SHUFFLEv16i8_v8i16(Op, DAG);
> +

Sure.