[PATCH] [X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2

Sun Jan 4 16:04:01 PST 2015

Really sorry Simon, I lost track of this over the holidays as I became distracted with SROA stuff.


REPOSITORY
  rL LLVM

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7737-7740
@@ -7747,4 +7736,6 @@
     if (ZeroableRight) {
-      bool ValidShiftRight1 = isSequential(Shift, 0, Size - Shift, 0, Mask);
-      bool ValidShiftRight2 = isSequential(Shift, 0, Size - Shift, Size, Mask);
+      bool ValidShiftRight1 =
+          isSequentialOrUndefInRange(Mask, 0, Size - Shift, Shift);
+      bool ValidShiftRight2 =
+          isSequentialOrUndefInRange(Mask, 0, Size - Shift, Size + Shift);
 
----------------
Please go ahead and commit these bits and the comment update as a separate change. This change should focus on adding bit-shift variants.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7807-7809
@@ +7806,5 @@
+
+  for (auto map : ShiftMapping) {
+    if (VT.SimpleTy != map[0])
+      continue;
+
----------------
The first thing I think needs to be re-worked here is to change how you're computing the types to work with.

The table solution with a simple continue is really bad IMO. I had to spend some time thinking to see the best way to do this.

At first, I thought the best approach would be to build a function or an actual map from VT to the range of possible other vector types that we could use in combination with a shift. Then the outer loop would just be over the specific candidate types.

But then it occurred to me that you don't need to hard code any of these things.

Given an integer VT with N elements and M bits, you can just divid N by 2 and multiply M by 2 on each iteration as long as N > 1 and M <= 64. That will walk over all possible re-slicings of the vector type.

For x86 where the full permutation of types are legal, I would just add an assert that the resulting vector type is legal. This completely removes the need for a table or SSE2 vs AVX2 distinctions. And it leaves you with a single loop with no continues.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7826-7841
@@ +7825,18 @@
+      bool ZeroableRight = true;
+      for (int i = 0, e = Size * Scale; i != e; i += Scale) {
+        for (int j = Scale - Shift; j < Scale; j++) {
+          ZeroableRight &= Zeroable[i + j];
+        }
+      }
+
+      if (ZeroableRight) {
+        bool ValidShiftRight1 = true;
+        bool ValidShiftRight2 = true;
+
+        for (unsigned i = 0, e = Size * Scale; i != e; i += Scale) {
+          ValidShiftRight1 &= isSequentialOrUndefInRange(
+              Mask, i, Scale - Shift, i + Shift);
+          ValidShiftRight2 &= isSequentialOrUndefInRange(
+              Mask, i, Scale - Shift, i + Shift + Mask.size());
+        }
+
----------------
You can merge all of these tests into a single predicate function which has an outer loop containing an inner loop and two calls to isSequentialOrUndefInRange.

Then I suspect you can merge this predicate with the Left variant by providing it as a parameter which half to examine [Scale - Shift, Scale) for right, and [0, Shift) for left. You might need one other parameter, but still, I think its worth folding them.

By putting them into a predicate function, you don't need 3 boolean variables, and you can early exit as soon as you determine "no".

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7844-7849
@@ +7843,8 @@
+        if (ValidShiftRight1 || ValidShiftRight2) {
+          // Cast the inputs to ShiftVT to match VSRLI and then back again.
+          SDValue &TargetV = ValidShiftRight1 ? V1 : V2;
+          SDValue V = DAG.getNode(ISD::BITCAST, DL, ShiftVT, TargetV);
+          SDValue Shifted = DAG.getNode(X86ISD::VSRLI, DL, ShiftVT, V,
+                                        DAG.getConstant(ShiftAmt, MVT::i8));
+          return DAG.getNode(ISD::BITCAST, DL, VT, Shifted);
+        }
----------------
Once you make everything a nice re-usable predicate for testing both right and left shifts, you can write this code once and just have a conditional to select which instruction (right or left) is used.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7787-7790
@@ +7786,6 @@
+
+    // We can shift the elements of the integer vector by whole multiples of
+    // their width within the elements of the larger integer vector. Test each
+    // multiple to see if we can find a match with the moved element indices
+    // and that the shifted in elements are all zeroable.
+    for (int Shift = 1; Shift != Scale; Shift++) {
----------------
[IGNORE THIS: phabricator won't let me delete the comment]

http://reviews.llvm.org/D6649

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/