[PATCH] [X86] Improved lowering of v4x32 build_vector dag nodes.

Tue Nov 18 11:33:02 PST 2014

Hi qcolombet, nadav, grosbach, delena,

Hi Quentin, Nadav (and all),

This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes to blend/insertps.
In particular, this patch improves function 'LowerBuildVectorv4x32' which works under the following preconditions:
 - the build_vector in input is not a build_vector of all-zeros;
 - the build_vector in input has at least one non-zero element.

This patch improves the previous behavior as follows:
 1) A build_vector that performs a blend with a zero vector is converted to a shuffle.
 2) We now identify more opportunities to lower a build_vector into an insertps with zero masking.

About 1), this is to let the shuffle legalizer expand the dag node in a optimal way. In particular, this helps improving the codegen in cases where an insertps is selected instead of a movq or a blend (See the differences in test sse41.ll and sse2.ll).

About 2), we now get much better codegen in all the new test cases added in sse41.ll.

For example:
;;
define <4 x float> @insertps_7(<4 x float> %A, <4 x float> %B) #0 {
entry:
  %vecext = extractelement <4 x float> %A, i32 0
  %vecinit = insertelement <4 x float> undef, float %vecext, i32 0
  %vecinit1 = insertelement <4 x float> %vecinit, float 0.000000e+00, i32 1
  %vecext2 = extractelement <4 x float> %B, i32 1
  %vecinit3 = insertelement <4 x float> %vecinit1, float %vecext2, i32 2
  %vecinit4 = insertelement <4 x float> %vecinit3, float 0.000000e+00, i32 3
  ret <4 x float> %vecinit4
}
;;

Before the backend generated the following assembly:
  shufps $-27, %xmm1, %xmm1
  xorps %xmm2, %xmm2
  blendps $14, %xmm2, %xmm0
  blendps $14, %xmm2, %xmm1
  unpcklpd %xmm1, %xmm0
  retq

with this patch, the backend correctly lowers the build_vector to insertps:
  insertps $170, %xmm1, %xmm0 # xmm0 = xmm0[0],zero,xmm1[1],zero
  retq

Please let me know if ok to submit.
Thanks,
Andrea

http://reviews.llvm.org/D6311

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/sse2.ll
  test/CodeGen/X86/sse41.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6311.16345.patch
Type: text/x-patch
Size: 15475 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141118/abfc695a/attachment.bin>