[PATCH][x86] Teach how to fold target specific dags for packed shifts by immedate count into build_vector.

Fri Dec 27 09:52:21 PST 2013

Hi Nadav,

On Thu, Dec 26, 2013 at 6:57 AM, Nadav Rotem <nrotem at apple.com> wrote:
> Overall the patch LGTM.  Do we have something like that for the non-target-specific code?  If not then we should move it there and figure out a way to handle the target-specific x86 nodes.

Thanks for the feedback!
I created a new patch that fixes the original poor code-gen bug in the
target independent part.
I managed to add a new rule that teaches the DAGCombiner how to fold a
sext_inreg of a build_vector of ConstantSDNodes (or UNDEFs) into a
build_vector.

The differences with respect to my previous patch are:
 1) this new patch does not handle the target specific packed vector
shift nodes;
 2) I had to touch test test/CodeGen/X86/blend-msb.ll (see below).

About point 1.
For simplicity I plan to send a separate patch that adds target
specific combine rules to fold packed vector shifts by immediate count
(VSHLI/VSRLI/VSRAI) into a build_vector in the case where the vector
to shift is a vector of constants (or undefs).

About point 2.
With this fix, we now generate a slightly different (but still valid)
mask for the 'blendvps' instruction expected by test
test/CodeGen/X86/blend-msb.ll.

-;CHECK: movl $-2147483648
+;CHECK: movl $-1

The test originally expected a Mask of $-2147483648 in input to the
'blendvps'; with my patch however however we produce $-1.
The two masks are both ok; the difference is that before we were able
to simplify the constant value for the mask knowing that 'blendvps'
only cares about specific bits in the sequence.

The target specific VSELECT combine uses method SimplifyDemandedBits
to 'simplify' the value of the mask in input to the blend.
It looks like method 'TargetLowering::SimplifyDemandedBits' knows how
to simplify the Constant Mask in input to a vector SELECT node when it
is generated by a SIGN_EXTEND_INREG; it doesn't seem to know how to
handle the case where the Mask in input is a BUILD_VECTOR.
I decided for now to fix the test modifying the expected number in
input to the movl.

> +// Return true if this is a BUILD_VECTOR of ConstantSDNodes.
> +// This is a helper function used by `getTargetVShiftByConstNode`.
> +static bool isBuildVectorOfConstantSDNodes(SDValue N) {
> +  if (N->getOpcode() != ISD::BUILD_VECTOR)
> +    return false;
> +
> +  for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i)
> +    if (!isa<ConstantSDNode>(N->getOperand(i)))
> +      return false;
> +  return true;
> +}
> +
>
> You can also handle UNDEFs here.

I moved this function in the ISD namespace into
lib/CodeGen/SelectionDAG/SelectionDAG.cpp and now it also accepts
UNDEFs.

Please let me know what you think about this new patch and if it is ok
to submit.

Thanks!
Andrea
-------------- next part --------------
Index: test/CodeGen/X86/sse2-blend.ll
===================================================================

--- test/CodeGen/X86/sse2-blend.ll	(revision 198076)
+++ test/CodeGen/X86/sse2-blend.ll	(working copy)
@@ -1,9 +1,9 @@
 ; RUN: llc < %s -march=x86 -mcpu=yonah -mattr=+sse2,-sse4.1 | FileCheck %s
 
 ; CHECK: vsel_float
-; CHECK: pandn
-; CHECK: pand
-; CHECK: por
+; CHECK: xorps
+; CHECK: movss
+; CHECK: orps
 ; CHECK: ret
 define void at vsel_float(<4 x float>* %v1, <4 x float>* %v2) {
   %A = load <4 x float>* %v1
@@ -14,9 +14,9 @@
 }
 
 ; CHECK: vsel_i32
-; CHECK: pandn
-; CHECK: pand
-; CHECK: por
+; CHECK: xorps
+; CHECK: movss
+; CHECK: orps
 ; CHECK: ret
 define void at vsel_i32(<4 x i32>* %v1, <4 x i32>* %v2) {
   %A = load <4 x i32>* %v1
Index: test/CodeGen/X86/vselect.ll
===================================================================
--- test/CodeGen/X86/vselect.ll	(revision 0)
+++ test/CodeGen/X86/vselect.ll	(revision 0)
@@ -0,0 +1,133 @@
+; RUN: llc -march=x86-64 -mcpu=corei7 -mattr=-sse4.1 < %s | FileCheck %s
+
+; Verify that we don't emit packed vector shifts instructions if the
+; condition used by the vector select is a vector of constants.
+
+
+define <4 x float> @test1(<4 x float> %a, <4 x float> %b) {
+  %1 = select <4 x i1> <i1 true, i1 false, i1 true, i1 false>, <4 x float> %a, <4 x float> %b
+  ret <4 x float> %1
+}
+; CHECK-LABEL: test1
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: ret
+
+
+define <4 x float> @test2(<4 x float> %a, <4 x float> %b) {
+  %1 = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x float> %a, <4 x float> %b
+  ret <4 x float> %1
+}
+; CHECK-LABEL: test2
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: ret
+
+
+define <4 x float> @test3(<4 x float> %a, <4 x float> %b) {
+  %1 = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x float> %a, <4 x float> %b
+  ret <4 x float> %1
+}
+; CHECK-LABEL: test3
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: ret
+
+
+define <4 x float> @test4(<4 x float> %a, <4 x float> %b) {
+  %1 = select <4 x i1> <i1 false, i1 false, i1 false, i1 false>, <4 x float> %a, <4 x float> %b
+  ret <4 x float> %1
+}
+; CHECK-LABEL: test4
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: movaps  %xmm1, %xmm0
+; CHECK: ret
+
+
+define <4 x float> @test5(<4 x float> %a, <4 x float> %b) {
+  %1 = select <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> %a, <4 x float> %b
+  ret <4 x float> %1
+}
+; CHECK-LABEL: test5
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: ret
+
+
+define <8 x i16> @test6(<8 x i16> %a, <8 x i16> %b) {
+  %1 = select <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false>, <8 x i16> %a, <8 x i16> %a
+  ret <8 x i16> %1
+}
+; CHECK-LABEL: test6
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: ret
+
+
+define <8 x i16> @test7(<8 x i16> %a, <8 x i16> %b) {
+  %1 = select <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false>, <8 x i16> %a, <8 x i16> %b
+  ret <8 x i16> %1
+}
+; CHECK-LABEL: test7
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: ret
+
+
+define <8 x i16> @test8(<8 x i16> %a, <8 x i16> %b) {
+  %1 = select <8 x i1> <i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 true, i1 true>, <8 x i16> %a, <8 x i16> %b
+  ret <8 x i16> %1
+}
+; CHECK-LABEL: test8
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: ret
+
+define <8 x i16> @test9(<8 x i16> %a, <8 x i16> %b) {
+  %1 = select <8 x i1> <i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <8 x i16> %a, <8 x i16> %b
+  ret <8 x i16> %1
+}
+; CHECK-LABEL: test9
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: movaps  %xmm1, %xmm0
+; CHECK-NEXT: ret
+
+define <8 x i16> @test10(<8 x i16> %a, <8 x i16> %b) {
+  %1 = select <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i16> %a, <8 x i16> %b
+  ret <8 x i16> %1
+}
+; CHECK-LABEL: test10
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: ret
+
+define <8 x i16> @test11(<8 x i16> %a, <8 x i16> %b) {
+  %1 = select <8 x i1> <i1 false, i1 true, i1 true, i1 false, i1 undef, i1 true, i1 true, i1 undef>, <8 x i16> %a, <8 x i16> %b
+  ret <8 x i16> %1
+}
+; CHECK-LABEL: test11
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: ret
+
+define <8 x i16> @test12(<8 x i16> %a, <8 x i16> %b) {
+  %1 = select <8 x i1> <i1 false, i1 false, i1 undef, i1 false, i1 false, i1 false, i1 false, i1 undef>, <8 x i16> %a, <8 x i16> %b
+  ret <8 x i16> %1
+}
+; CHECK-LABEL: test12
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: ret
+
+define <8 x i16> @test13(<8 x i16> %a, <8 x i16> %b) {
+  %1 = select <8 x i1> <i1 undef, i1 undef, i1 undef, i1 undef, i1 undef, i1 undef, i1 undef, i1 undef>, <8 x i16> %a, <8 x i16> %b
+  ret <8 x i16> %1
+}
+; CHECK-LABEL: test13
+; CHECK-NOT: psllw
+; CHECK-NOT: psraw
+; CHECK: ret
+
+
Index: test/CodeGen/X86/blend-msb.ll
===================================================================
--- test/CodeGen/X86/blend-msb.ll	(revision 198076)
+++ test/CodeGen/X86/blend-msb.ll	(working copy)
@@ -5,7 +5,7 @@
 ; shifting the needed bit to the MSB, and not using shl+sra.
 
 ;CHECK-LABEL: vsel_float:
-;CHECK: movl $-2147483648
+;CHECK: movl $-1
 ;CHECK-NEXT: movd
 ;CHECK-NEXT: blendvps
 ;CHECK: ret
@@ -15,7 +15,7 @@
 }
 
 ;CHECK-LABEL: vsel_4xi8:
-;CHECK: movl $-2147483648
+;CHECK: movl $-1
 ;CHECK-NEXT: movd
 ;CHECK-NEXT: blendvps
 ;CHECK: ret
@@ -26,12 +26,12 @@
 
 
 ; We do not have native support for v8i16 blends and we have to use the
-; blendvb instruction or a sequence of NAND/OR/AND. Make sure that we do not r
+; blendvb instruction or a sequence of NAND/OR/AND. Make sure that we do not
 ; reduce the mask in this case.
 ;CHECK-LABEL: vsel_8xi16:
-;CHECK: psllw
-;CHECK: psraw
-;CHECK: pblendvb
+;CHECK: andps
+;CHECK: andps
+;CHECK: orps
 ;CHECK: ret
 define <8 x i16> @vsel_8xi16(<8 x i16> %v1, <8 x i16> %v2) {
   %vsel = select <8 x i1> <i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false>, <8 x i16> %v1, <8 x i16> %v2
Index: test/CodeGen/X86/2011-12-28-vselecti8.ll
===================================================================
--- test/CodeGen/X86/2011-12-28-vselecti8.ll	(revision 198076)
+++ test/CodeGen/X86/2011-12-28-vselecti8.ll	(working copy)
@@ -3,10 +3,20 @@
 target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
 target triple = "x86_64-apple-darwin11.2.0"
 
-; CHECK: @foo8
-; CHECK: psll
-; CHECK: psraw
-; CHECK: pblendvb
+; During legalization, the vselect mask is 'type legalized' into a 
+; wider BUILD_VECTOR. This causes the introduction of a new
+; sign_extend_inreg in the DAG.
+;
+; A sign_extend_inreg of a vector of ConstantSDNode or undef can be
+; always folded into a simple build_vector.
+;
+; Make sure that the sign_extend_inreg is simplified and that we
+; don't generate psll, psraw and pblendvb from the vselect.
+
+; CHECK-LABEL: foo8
+; CHECK-NOT: psll
+; CHECK-NOT: psraw
+; CHECK-NOT: pblendvb
 ; CHECK: ret
 define void @foo8(float* nocapture %RET) nounwind {
 allocas:
@@ -17,4 +27,3 @@
   ret void
 }
 
-
Index: include/llvm/CodeGen/SelectionDAGNodes.h
===================================================================
--- include/llvm/CodeGen/SelectionDAGNodes.h	(revision 198076)
+++ include/llvm/CodeGen/SelectionDAGNodes.h	(working copy)
@@ -70,6 +70,10 @@
   /// BUILD_VECTOR where all of the elements are 0 or undef.
   bool isBuildVectorAllZeros(const SDNode *N);
 
+  /// \brief Return true if this is a build vector where each element is
+  /// either undef o a ConstantSDNode.
+  bool isBuildVectorOfConstantSDNodes(const SDNode *N);
+
   /// isScalarToVector - Return true if the specified node is a
   /// ISD::SCALAR_TO_VECTOR node or a BUILD_VECTOR node where only the low
   /// element is not an undef.
Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 198076)
+++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
@@ -5511,6 +5511,29 @@
                          BSwap, N1);
   }
 
+  // Fold a sext_inreg of a build_vector of ConstantSDNodes or undefs
+  // into a build_vector.
+  if (ISD::isBuildVectorOfConstantSDNodes(N0.getNode())) {
+    SmallVector<SDValue, 8> Elts;
+    unsigned NumElts = N0->getNumOperands();
+    unsigned ShAmt = VTBits - EVTBits;
+
+    for (unsigned i = 0; i != NumElts; ++i) {
+      SDValue Op = N0->getOperand(i);
+      if (Op->getOpcode() == ISD::UNDEF) {
+        Elts.push_back(Op);
+        continue;
+      }
+
+      ConstantSDNode *CurrentND = cast<ConstantSDNode>(Op);
+      const APInt &C = CurrentND->getAPIntValue();
+      Elts.push_back(DAG.getConstant(C.shl(ShAmt).ashr(ShAmt),
+                                     Op.getValueType()));
+    }
+
+    return DAG.getNode(ISD::BUILD_VECTOR, SDLoc(N), VT, &Elts[0], NumElts);
+  }
+
   return SDValue();
 }
 
Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 198076)
+++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
@@ -179,6 +179,22 @@
   return true;
 }
 
+/// \brief Return true if the specified node is a BUILD_VECTOR node of
+/// all ConstantSDNode or undef.
+bool ISD::isBuildVectorOfConstantSDNodes(const SDNode *N) {
+  if (N->getOpcode() != ISD::BUILD_VECTOR)
+    return false;
+
+  for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i) {
+    SDValue Op = N->getOperand(i);
+    if (Op.getOpcode() == ISD::UNDEF)
+      continue;
+    if (!isa<ConstantSDNode>(Op))
+      return false;
+  }
+  return true;
+}
+
 /// isScalarToVector - Return true if the specified node is a
 /// ISD::SCALAR_TO_VECTOR node or a BUILD_VECTOR node where only the low
 /// element is not an undef.