[PATCH] Lower certain build_vectors to insertps instructions
Filipe Cabecinhas
filcab+llvm.phabricator at gmail.com
Sun Apr 27 11:42:33 PDT 2014
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:5423
@@ +5422,3 @@
+ SDValue Elem = Op.getOperand(Idx);
+ if (Elem.getOpcode() == ISD::UNDEF || X86::isZeroNode(Elem))
+ continue;
----------------
Elena Demikhovsky wrote:
> If it is a zeroNode, somebody should take care for this. I don't understand how do your tests with zeroes work.
This is testing if this element of the build_vector is a zeroNode.
If it is, we're still ok to do the optimization, since we can insert 0 wherever we want.
What I can do is simply not check for zero or undef. It won't change CorrectIdx nor change the comparison of CorrectIdx and NumNonZero.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:5447
@@ +5446,3 @@
+ FirstNonZeroIdx << 6 | FirstNonZeroIdx << 4 | (~NonZeros & 0xf));
+ return DAG.getNode(X86ISD::INSERTPS, dl, VT, V, V, InsertpsMask);
+}
----------------
Elena Demikhovsky wrote:
> You can't insert V into V. if you want to "copy" 3 elements and insert 1, you should write
> (INSERTPS, dl, VT, V, scalar_to_vector(elt), index)
>
> If you want to copy 2 elements and insert 2, you can't use INSERTPS at all
>
For now this optimization is only dealing with inserting 0 in vectors. For inserting 0 in the vectors, it is acceptable to insert V into V (with countD == countS == the index of one of the elements that won't be turned into 0).
In the future, it will have to be changed to insert a V0 into a V1 (or vice-versa), with a special case for when we're moving an element inside V0.
e.g: (x,y,z,w) -> (x,z,z,w) or (x,y,z,w) -> (x,0,0,x), etc.
We're only using insertps iff NumNonZero (which was counted in Lowerbuildvector and is the number of non-zero+non-undef elements) is equal to CorrectIdx (which is the number of elements from V that are inserted in the new vector with the same index they had in V). Since we know this, we can use insertps with V as both vector arguments.
================
Comment at: test/CodeGen/X86/sse41.ll:353
@@ +352,3 @@
+ %vecinit4 = insertelement <4 x float> %vecinit3, float 0.0, i32 3
+ ret <4 x float> %vecinit4
+}
----------------
Elena Demikhovsky wrote:
> What code is generated here?
_shuf_XY00: ## @shuf_XY00
.cfi_startproc
## BB#0: ## %entry
insertps $12, %xmm0, %xmm0 ## encoding: [0x66,0x0f,0x3a,0x21,0xc0,0x0c]
## xmm0 = xmm0[0,1],zero,zero
retq ## encoding: [0xc3]
.cfi_endproc
http://reviews.llvm.org/D3521
More information about the llvm-commits
mailing list