[PATCH] [X86] Skip concat_vectors when lowering vector broadcast

Fri Dec 13 12:29:50 PST 2013

Hi Nadav,

On 11 December 2013 21:23, Nadav Rotem <nrotem at apple.com> wrote:
>>
>> Yes, that was my alternative (handle it in DAGCombiner::visitCONCAT_
>> VECTORS).  My fear was that the general combine could get quite
>> complex and as I'm new to LLVM I'd get stuck.  Would it be acceptable
>> to add a simple combine that just handled the case above in the first
>> instance?
>
>
> Sure, lets start by handling the simple case first. :)

So I had a go at implementing a combine - the simple case of
concat_vectors(BUILD_VECTOR(X, undef)) canonicalized into a larger
BUILD_VECTOR.

This was looking good, and it fixed the initial vbroadcast issue
without the need to change the x86 backend code.  However, one of the
LLVM tests failed (CodeGen/X86/avx-shuffle.ll).  Analyzing this showed
the swap8doubles test failed indirectly due to a change in the
expected output.

There's an x86 specific transform in PerformShuffleCombine256
(X86ISelLowering.cpp) that recognizes a vector zero extend.  This
involves a complex pattern of concat_vectors, BUILD_VECTOR and undef
in operands 1 and 2 of a shuffle.  Of course, this is now
canonicalized and the optimization no longer triggers.

I had a go at "fixing" the optimization to handle the canonicalized
pattern instead.  This is fairly trivial for the second operand to the
shuffle (the vector built from zeroes and undef).  Instead of
concat_vectors(BUILD_VECTOR(0,0,0,0), undef)), this is now
BUILD_VECTOR(0,0,0,0,...).  This fixes the failure in the LLVM test.

However, the problem is there is still a case which won't be
recognized.  This is when V (referring to the diagram in the comments)
is itself a BUILD_VECTOR.  In this case, the first operand to the
shuffle will also have been canonicalized.  This is rather more
difficult to handle as you'd need to recognize a BUILD_VECTOR in which
the last half was undef.

Before trying to handle this case I was going to come up with a
testcase to trigger it (the hope being that with both operands to the
shuffle canonicalized, the resultant code without triggering the
optimization may be as good, so I don't need to handle it anyway).

However, this testcase is proving tricky, so I thought I'd ask for
advice before going any further.  Does the approach still sound
sensible?  Is "fixing" PerformShuffleCombine256 the right way, or does
it indicate that there may be other optimizations that will break, and
ultimately that canonicalizing the concat_vectors/BUILD_VECTOR is a
bad idea?

The attached patch contains my modifications as described above...

Thanks,
Rob.
-------------- next part --------------
Index: Target/X86/X86ISelLowering.cpp
===================================================================

--- Target/X86/X86ISelLowering.cpp	(revision 197173)
+++ Target/X86/X86ISelLowering.cpp	(working copy)
@@ -16193,23 +16193,19 @@
   unsigned NumElems = VT.getVectorNumElements();
 
   if (V1.getOpcode() == ISD::CONCAT_VECTORS &&
-      V2.getOpcode() == ISD::CONCAT_VECTORS) {
+      V2.getOpcode() == ISD::BUILD_VECTOR) {
     //
-    //                   0,0,0,...
-    //                      |
-    //    V      UNDEF    BUILD_VECTOR    UNDEF
-    //     \      /           \           /
-    //  CONCAT_VECTOR         CONCAT_VECTOR
+    //    V      UNDEF         0,0,0,... 
+    //     \      /               |
+    //  CONCAT_VECTOR         BUILD_VECTOR
     //         \                  /
     //          \                /
     //          RESULT: V + zero extended
     //
-    if (V2.getOperand(0).getOpcode() != ISD::BUILD_VECTOR ||
-        V2.getOperand(1).getOpcode() != ISD::UNDEF ||
-        V1.getOperand(1).getOpcode() != ISD::UNDEF)
+    if(V1.getOperand(1).getOpcode() != ISD::UNDEF)
       return SDValue();
 
-    if (!ISD::isBuildVectorAllZeros(V2.getOperand(0).getNode()))
+    if (!ISD::isBuildVectorAllZeros(V2.getNode()))
       return SDValue();
 
     // To match the shuffle mask, the first half of the mask should
Index: CodeGen/SelectionDAG/DAGCombiner.cpp
===================================================================
--- CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 197173)
+++ CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
@@ -9893,6 +9893,21 @@
       SDValue Res = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NVT, Scalar);
       return DAG.getNode(ISD::BITCAST, dl, VT, Res);
     }
+
+    // concat_vectors(build_vector(X), undef) -> build_vector(X, undef,...).
+    if (In->getOpcode() == ISD::BUILD_VECTOR) {
+      SDValue Undef = DAG.getUNDEF(In.getOperand(0).getValueType());
+      unsigned BuildVecNumElts =  In.getNumOperands();
+      SmallVector<SDValue, 8> Opnds;
+
+      for (unsigned i = 0, e = BuildVecNumElts; i != e; ++i)
+        Opnds.push_back(In.getOperand(i));
+      for (unsigned i = 0, e = BuildVecNumElts; i != e; ++i)
+        Opnds.push_back(Undef);
+
+      return DAG.getNode(ISD::BUILD_VECTOR, SDLoc(N), VT, &Opnds[0],
+                         Opnds.size());
+    }
   }
 
   // Type legalization of vectors and DAG canonicalization of SHUFFLE_VECTOR