[PATCH] [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2

Fri Oct 17 11:11:32 PDT 2014

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7451-7456
@@ -7450,7 +7450,8 @@
 ///
-/// We have a generic PALIGNR instruction in x86 that will do an arbitrary
-/// byte-rotation of the concatenation of two vectors. This routine will
+/// SSSE3 has a generic PALIGNR instruction in x86 that will do an arbitrary
+/// byte-rotation of the concatenation of two vectors; pre-SSSE3 can use
+/// a PSRLDQ/PSLLDQ/POR pattern to get a similar effect. This routine will
 /// try to generically lower a vector shuffle through such an instruction. It
 /// does not check for the availability of PALIGNR-based lowerings, only the
 /// applicability of this strategy to the given mask. This matches shuffle
 /// vectors that look like:
----------------
The rest of this comment is now misleading.

I think it should now say something like "It does not check for the profitability of lowering either as PALIGNR or PSRLDQ/PSLLDQ/POR, only whether the mask is valid to lower in that form."

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7550-7552
@@ +7549,5 @@
+
+  // On pre-SSSE3 targets we will be able to shuffle/unpack more efficiently for
+  // 32-bit or 64-bit integers than using the PSRLDQ/PSLLDQ/POR pattern.
+  if(VT.getScalarSizeInBits() < 32) {
+    // Cast the inputs to v2i64 to match PSLLDQ/PSRLDQ.
----------------
Rather than this, I would suggest testing for SSSE3 prior to calling this in the routines where it is only profitable with SSSE3. That localizes the profitability breakdown to those routines which are already broken out by element size.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7610-7616
@@ -7584,1 +7609,9 @@
 
+static SDValue lowerVectorShuffleAsByteShift(SDLoc DL, MVT VT, SDValue V1,
+                                             SDValue V2,
+                                             ArrayRef<int> Mask,
+                                             SelectionDAG &DAG)
+{
+  assert(!isNoopShuffleMask(Mask) && "We shouldn't lower no-op shuffles!");
+
+  SmallBitVector Zeroable = computeZeroableShuffleElements(Mask, V1, V2);
----------------
The formatting here isn't right. Can you use clang-format? The curly at the least should be on the prior line.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7634-7641
@@ +7633,10 @@
+      continue;
+    int Candidate1 = i >= Size - RShift1 ? RShift1 : Mask[i] - i;
+    int Candidate2 = i >= Size - RShift2 ? RShift2 : Mask[i] - (i + Size);
+    ValidRShift1 &= (0 < Candidate1 && Candidate1 < Size);
+    ValidRShift2 &= (0 < Candidate2 && Candidate2 < Size);
+    RShift1 = RShift1 == 0 ? Candidate1 : RShift1;
+    RShift2 = RShift2 == 0 ? Candidate2 : RShift2;
+    ValidRShift1 &= (i >= Size - RShift1 ? Zeroable[i] : RShift1 == Candidate1);
+    ValidRShift2 &= (i >= Size - RShift2 ? Zeroable[i] : RShift2 == Candidate2);
+  }
----------------
The bitwise ands here are weird, and I think you'll need some comments explaining the nature of your search strategy here.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7655
@@ +7654,3 @@
+
+  // PSLLDQ : (little-endian) left byte shift
+  // [ zz,  0,  1,  2,  3,  4,  5,  6]
----------------
It would be really nice to write this logic once in a lambda, and run it over a reversed mask.

http://reviews.llvm.org/D5699