[PATCH] Lower certain build_vectors to insertps instructions

Filipe Cabecinhas filcab+llvm.phabricator at gmail.com
Sun Apr 27 11:42:33 PDT 2014

Comment at: lib/Target/X86/X86ISelLowering.cpp:5423
@@ +5422,3 @@
+    SDValue Elem = Op.getOperand(Idx);
+    if (Elem.getOpcode() == ISD::UNDEF || X86::isZeroNode(Elem))
+      continue;
Elena Demikhovsky wrote:
> If it is a zeroNode, somebody should take care for this. I don't understand how do your tests with zeroes work.
This is testing if this element of the build_vector is a zeroNode.
If it is, we're still ok to do the optimization, since we can insert 0 wherever we want.

What I can do is simply not check for zero or undef. It won't change CorrectIdx nor change the comparison of CorrectIdx and NumNonZero.

Comment at: lib/Target/X86/X86ISelLowering.cpp:5447
@@ +5446,3 @@
+      FirstNonZeroIdx << 6 | FirstNonZeroIdx << 4 | (~NonZeros & 0xf));
+  return DAG.getNode(X86ISD::INSERTPS, dl, VT, V, V, InsertpsMask);
Elena Demikhovsky wrote:
> You can't insert V into V. if you want to "copy" 3 elements and insert 1, you should write
> (INSERTPS, dl, VT, V, scalar_to_vector(elt), index)
> If you want to copy 2 elements and insert 2, you can't use INSERTPS at all
For now this optimization is only dealing with inserting 0 in vectors. For inserting 0 in the vectors, it is acceptable to insert V into V (with countD == countS == the index of one of the elements that won't be turned into 0).

In the future, it will have to be changed to insert a V0 into a V1 (or vice-versa), with a special case for when we're moving an element inside V0.
e.g: (x,y,z,w) -> (x,z,z,w) or (x,y,z,w) -> (x,0,0,x), etc.

We're only using insertps iff NumNonZero (which was counted in Lowerbuildvector and is the number of non-zero+non-undef elements) is equal to CorrectIdx (which is the number of elements from V that are inserted in the new vector with the same index they had in V). Since we know this, we can use insertps with V as both vector arguments.

Comment at: test/CodeGen/X86/sse41.ll:353
@@ +352,3 @@
+  %vecinit4 = insertelement <4 x float> %vecinit3, float 0.0, i32 3
+  ret <4 x float> %vecinit4
Elena Demikhovsky wrote:
> What code is generated here?
  _shuf_XY00:                             ## @shuf_XY00
  ## BB#0:                                ## %entry
    insertps    $12, %xmm0, %xmm0 ## encoding: [0x66,0x0f,0x3a,0x21,0xc0,0x0c]
                                          ## xmm0 = xmm0[0,1],zero,zero
    retq                            ## encoding: [0xc3]


More information about the llvm-commits mailing list