[PATCH] Lower certain build_vectors to insertps instructions
Elena Demikhovsky
elena.demikhovsky at intel.com
Sat Apr 26 23:49:17 PDT 2014
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:5418
@@ +5417,3 @@
+ SDValue V = FirstNonZero.getOperand(0);
+ unsigned CorrectIdx = cast<ConstantSDNode>(FirstNonZero.getOperand(1))
+ ->getZExtValue() == FirstNonZeroIdx;
----------------
CorrectIdx here is boolean - 1 or 0. Further you do ++.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:5432
@@ +5431,3 @@
+ // ex: Getting one element from a vector, and the rest from another.
+ if (Elem.getOperand(0) != V)
+ return SDValue();
----------------
Looks like you are looking for a splat vector. But if you want to use INSERTPS, your build-vector should include only one non-zero element.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6204
@@ +6203,3 @@
+ if (EVTBits == 32) {
+ SDValue V = LowerBuildVectorv4x32(Op, NumElems, NonZeros, NumNonZero,
+ NumZero, DAG, Subtarget, *this);
----------------
What happens for 8x32 and 16x32 vectors here?
================
Comment at: test/CodeGen/X86/sse41.ll:331
@@ +330,3 @@
+; CHECK: ret
+ %vecext = extractelement <4 x float> %x, i32 0
+ %vecinit = insertelement <4 x float> undef, float %vecext, i32 0
----------------
Could you, please, explain what code you expect to see here? Is it only one insertps instruction?
Usually, such extract-insert chain we have for matrix transpose. But in this case the elements are extracted from different vectors.
http://reviews.llvm.org/D3521
More information about the llvm-commits
mailing list