[PATCH] Lower certain build_vectors to insertps instructions
Elena Demikhovsky
elena.demikhovsky at intel.com
Sun Apr 27 02:06:10 PDT 2014
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:5423
@@ +5422,3 @@
+ SDValue Elem = Op.getOperand(Idx);
+ if (Elem.getOpcode() == ISD::UNDEF || X86::isZeroNode(Elem))
+ continue;
----------------
If it is a zeroNode, somebody should take care for this. I don't understand how do your tests with zeroes work.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:5447
@@ +5446,3 @@
+ FirstNonZeroIdx << 6 | FirstNonZeroIdx << 4 | (~NonZeros & 0xf));
+ return DAG.getNode(X86ISD::INSERTPS, dl, VT, V, V, InsertpsMask);
+}
----------------
You can't insert V into V. if you want to "copy" 3 elements and insert 1, you should write
(INSERTPS, dl, VT, V, scalar_to_vector(elt), index)
If you want to copy 2 elements and insert 2, you can't use INSERTPS at all
================
Comment at: test/CodeGen/X86/sse41.ll:353
@@ +352,3 @@
+ %vecinit4 = insertelement <4 x float> %vecinit3, float 0.0, i32 3
+ ret <4 x float> %vecinit4
+}
----------------
What code is generated here?
http://reviews.llvm.org/D3521
More information about the llvm-commits
mailing list