[PATCH] D13364: [x86] PR24562: fix incorrect folding of X86ISD::PSHUFB nodes that have a mask of all indices with the most significant bit set.

Andrea Di Biagio via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 1 13:20:04 PDT 2015


andreadb created this revision.
andreadb added reviewers: qcolombet, chandlerc, spatel, RKSimon.
andreadb added a subscriber: llvm-commits.

This patch fixes a problem in function 'combineX86ShuffleChain' that causes a chain of shuffles to be wrongly folded away when the combined shuffle mask has only one element.

We may end up with a combined shuffle mask of one element as a result of multiple calls to function 'canWidenShuffleElements()'. 
Function canWidenShuffleElements attempts to simplify a shuffle mask by widening the size of the elements being shuffled.
For every pair of shuffle indices, function canWidenShuffleElements checks if indices refer to adjacent elements.
If all pairs refer to "adjacent" elements then the shuffle mask is safely widened. As a consequence of widening, we end up with a new shuffle mask which is half the size of the original shuffle mask.

The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero indices.
Function canWidenShuffleElements would combine each pair of SM_SentinelZero indices into a single SM_SentinelZero index.  So, in a logarithmic number of steps (4 in this case), the pshufb mask is simplified to a mask with only one index which is equal to SM_SentinelZero.

Before this patch, function combineX86ShuffleChain wrongly assumed that a mask of size one is always equivalent to an identity mask. So, the entire shuffle chain was just folded away as the combined shuffle mask was treated as a no-op mask.

With this patch we know check if the only element of a combined shuffle mask is SM_SentinelZero. In case, we propagate a zero vector.

Please let me know if ok to submit.

Thanks,
Andrea

http://reviews.llvm.org/D13364

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/pr24562.ll

Index: test/CodeGen/X86/pr24562.ll
===================================================================
--- test/CodeGen/X86/pr24562.ll
+++ test/CodeGen/X86/pr24562.ll
@@ -0,0 +1,19 @@
+; RUN: llc -mattr=+ssse3 -mtriple=x86_64-unknown-unknown < %s | FileCheck %s
+
+; The pshufb from function @pr24562 was wrongly folded into its first operand
+; as a result of a late target shuffle combine on the legalized selection dag.
+; 
+; Check that the pshufb is correctly folded to a zero vector.
+
+define <2 x i64> @pr24562() {
+; CHECK-LABEL: pr24562:
+; CHECK:       # BB#0: # %entry
+; CHECK-NEXT:    xorps %xmm0, %xmm0
+; CHECK-NEXT:    retq
+entry:
+  %0 = call <16 x i8> @llvm.x86.ssse3.pshuf.b.128(<16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>) #2
+  %1 = bitcast <16 x i8> %0 to <2 x i64>
+  ret <2 x i64> %1
+}
+
+declare <16 x i8> @llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp
+++ lib/Target/X86/X86ISelLowering.cpp
@@ -21962,10 +21962,18 @@
   MVT RootVT = Root.getSimpleValueType();
   SDLoc DL(Root);
 
-  // Just remove no-op shuffle masks.
   if (Mask.size() == 1) {
-    DCI.CombineTo(Root.getNode(), DAG.getBitcast(RootVT, Input),
-                  /*AddTo*/ true);
+    // We may end up with an accumulated mask of size 1 as a result of
+    // widening of shuffle operands (see function canWidenShuffleElements).
+    // If the only shuffle index is equal to SM_SentinelZero then propagate
+    // a zero vector.
+    if (Mask[0] == SM_SentinelZero)
+      // fold this shuffle chain to a zero vector.
+      DCI.CombineTo(Root.getNode(), getZeroVector(RootVT, Subtarget, DAG, DL));
+    else
+      // Just remove no-op shuffle masks.
+      DCI.CombineTo(Root.getNode(), DAG.getBitcast(RootVT, Input),
+                    /*AddTo*/ true);
     return true;
   }
 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D13364.36281.patch
Type: text/x-patch
Size: 2120 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151001/59cdf57b/attachment.bin>


More information about the llvm-commits mailing list