[PATCH] D19661: [X86] Also try to zero elts when lowering v32i8 shuffle with PSHUFB.

Ahmed Bougacha via llvm-commits llvm-commits at lists.llvm.org
Thu May 5 13:09:30 PDT 2016


ab added a comment.

In http://reviews.llvm.org/D19661#415401, @RKSimon wrote:

> > We could generalize lowerVectorShuffleAsPSHUFB to ymm, but when I tried that was less readable than doing it inline.
>
>
> How much did you have to do to alter lowerVectorShuffleAsPSHUFB? I'd have expected it to end up quite similar to the implementation in combineX86ShuffleChain which I thought was quite straightforward.


Sorry: it's not ymm that's the problem, it's that lowerVectorShuffleAsPSHUFB should really be lowerVectorShuffleAsBlendOfPSHUFBs; if you don't want the final blend, there's not that much you can reuse.
It's also tricky to extract the PSHUFB creation out of this hypothetical lowerVectorShuffleAsBlendOfPSHUFB. It wants a PSHUFB that ignores the lanes from the other operand.  But we want a PSHUFB if there are no lanes from the other operand.

I'll give it another think; maybe we can do better.


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:11427-11445
@@ -11426,9 +11426,21 @@
 
-  if (isSingleInputShuffleMask(Mask)) {
-    // There are no generalized cross-lane shuffle operations available on i8
-    // element types.
-    if (is128BitLaneCrossingShuffleMask(MVT::v32i8, Mask))
-      return lowerVectorShuffleAsLanePermuteAndBlend(DL, MVT::v32i8, V1, V2,
-                                                     Mask, DAG);
+  SmallBitVector Zeroable = computeZeroableShuffleElements(Mask, V1, V2);
+  bool SingleInputMask = true;
+  bool SingleInputAndZeroesMask = true;
+  for (int i = 0, Size = Mask.size(); i < Size; ++i) {
+    if (Mask[i] >= Size) {
+      SingleInputMask = false;
+      if (!Zeroable[i]) {
+        SingleInputAndZeroesMask = false;
+        break;
+      }
+    }
+  }
+  // There are no generalized cross-lane shuffle operations available on i8
+  // element types.
+  if (SingleInputMask && is128BitLaneCrossingShuffleMask(MVT::v32i8, Mask))
+    return lowerVectorShuffleAsLanePermuteAndBlend(DL, MVT::v32i8, V1, V2, Mask,
+                                                   DAG);
 
+  if (SingleInputAndZeroesMask) {
     SDValue PSHUFBMask[32];
----------------
I wanted to do that at first, but you need to look at the operands, and at that point you basically duplicated computeZeroableShuffleElements.

An alternative I considered was to do some kind of computeZeroableShuffleMask(Mask, V1, V2, NewMask), which returns a mask with SM_SentinelZero and SM_SentinelUndef. Then, users can check that it isSingleInputShuffleMask and do their thing.  WDYT?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:11447-11455
@@ -11434,8 +11446,11 @@
     SDValue PSHUFBMask[32];
-    for (int i = 0; i < 32; ++i)
-      PSHUFBMask[i] =
-          Mask[i] < 0
-              ? DAG.getUNDEF(MVT::i8)
-              : DAG.getConstant(Mask[i] < 16 ? Mask[i] : Mask[i] - 16, DL,
-                                MVT::i8);
+    for (int i = 0; i < 32; ++i) {
+      if (Mask[i] < 0)
+        PSHUFBMask[i] = DAG.getUNDEF(MVT::i8);
+      else if (Zeroable[i])
+        PSHUFBMask[i] = DAG.getConstant(0x80, DL, MVT::i8);
+      else
+        PSHUFBMask[i] =
+            DAG.getConstant(Mask[i] < 16 ? Mask[i] : Mask[i] - 16, DL, MVT::i8);
+    }
 
----------------
Heh, fair enough, will do!


http://reviews.llvm.org/D19661





More information about the llvm-commits mailing list