[PATCH] Lower certain build_vectors to insertps instructions

Sat Apr 26 23:49:17 PDT 2014

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:5418
@@ +5417,3 @@
+  SDValue V = FirstNonZero.getOperand(0);
+  unsigned CorrectIdx = cast<ConstantSDNode>(FirstNonZero.getOperand(1))
+                            ->getZExtValue() == FirstNonZeroIdx;
----------------
CorrectIdx here is boolean - 1 or 0. Further you do ++.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:5432
@@ +5431,3 @@
+    // ex: Getting one element from a vector, and the rest from another.
+    if (Elem.getOperand(0) != V)
+      return SDValue();
----------------
Looks like you are looking for a splat vector. But if you want to use INSERTPS, your build-vector should include only one non-zero element.


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6204
@@ +6203,3 @@
+  if (EVTBits == 32) {
+    SDValue V = LowerBuildVectorv4x32(Op, NumElems, NonZeros, NumNonZero,
+                                      NumZero, DAG, Subtarget, *this);
----------------
What happens for 8x32 and 16x32 vectors here?

================
Comment at: test/CodeGen/X86/sse41.ll:331
@@ +330,3 @@
+; CHECK: ret
+  %vecext = extractelement <4 x float> %x, i32 0
+  %vecinit = insertelement <4 x float> undef, float %vecext, i32 0
----------------
Could you, please, explain what code you expect to see here? Is it only one insertps instruction?
Usually, such extract-insert chain we have for matrix transpose. But in this case the elements are extracted from different vectors.

http://reviews.llvm.org/D3521