[PATCH] Lower certain build_vectors to insertps instructions

Elena Demikhovsky elena.demikhovsky at intel.com
Sun Apr 27 02:06:10 PDT 2014


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:5423
@@ +5422,3 @@
+    SDValue Elem = Op.getOperand(Idx);
+    if (Elem.getOpcode() == ISD::UNDEF || X86::isZeroNode(Elem))
+      continue;
----------------
If it is a zeroNode, somebody should take care for this. I don't understand how do your tests with zeroes work.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:5447
@@ +5446,3 @@
+      FirstNonZeroIdx << 6 | FirstNonZeroIdx << 4 | (~NonZeros & 0xf));
+  return DAG.getNode(X86ISD::INSERTPS, dl, VT, V, V, InsertpsMask);
+}
----------------
You can't insert V into V. if you want to "copy" 3 elements and insert 1, you should write
(INSERTPS, dl, VT, V, scalar_to_vector(elt), index)

If you want to copy 2 elements and insert 2, you can't use INSERTPS at all


================
Comment at: test/CodeGen/X86/sse41.ll:353
@@ +352,3 @@
+  %vecinit4 = insertelement <4 x float> %vecinit3, float 0.0, i32 3
+  ret <4 x float> %vecinit4
+}
----------------
What code is generated here?

http://reviews.llvm.org/D3521






More information about the llvm-commits mailing list