[PATCH] [X86] Improved lowering of v4x32 build_vector dag nodes.
Andrea Di Biagio
Andrea_DiBiagio at sn.scee.net
Tue Nov 18 11:33:02 PST 2014
Hi qcolombet, nadav, grosbach, delena,
Hi Quentin, Nadav (and all),
This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes to blend/insertps.
In particular, this patch improves function 'LowerBuildVectorv4x32' which works under the following preconditions:
- the build_vector in input is not a build_vector of all-zeros;
- the build_vector in input has at least one non-zero element.
This patch improves the previous behavior as follows:
1) A build_vector that performs a blend with a zero vector is converted to a shuffle.
2) We now identify more opportunities to lower a build_vector into an insertps with zero masking.
About 1), this is to let the shuffle legalizer expand the dag node in a optimal way. In particular, this helps improving the codegen in cases where an insertps is selected instead of a movq or a blend (See the differences in test sse41.ll and sse2.ll).
About 2), we now get much better codegen in all the new test cases added in sse41.ll.
For example:
;;
define <4 x float> @insertps_7(<4 x float> %A, <4 x float> %B) #0 {
entry:
%vecext = extractelement <4 x float> %A, i32 0
%vecinit = insertelement <4 x float> undef, float %vecext, i32 0
%vecinit1 = insertelement <4 x float> %vecinit, float 0.000000e+00, i32 1
%vecext2 = extractelement <4 x float> %B, i32 1
%vecinit3 = insertelement <4 x float> %vecinit1, float %vecext2, i32 2
%vecinit4 = insertelement <4 x float> %vecinit3, float 0.000000e+00, i32 3
ret <4 x float> %vecinit4
}
;;
Before the backend generated the following assembly:
shufps $-27, %xmm1, %xmm1
xorps %xmm2, %xmm2
blendps $14, %xmm2, %xmm0
blendps $14, %xmm2, %xmm1
unpcklpd %xmm1, %xmm0
retq
with this patch, the backend correctly lowers the build_vector to insertps:
insertps $170, %xmm1, %xmm0 # xmm0 = xmm0[0],zero,xmm1[1],zero
retq
Please let me know if ok to submit.
Thanks,
Andrea
http://reviews.llvm.org/D6311
Files:
lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/sse2.ll
test/CodeGen/X86/sse41.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6311.16345.patch
Type: text/x-patch
Size: 15475 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141118/abfc695a/attachment.bin>
More information about the llvm-commits
mailing list