[PATCH] [X86][SSE] Vectorize v2i32 to v2f64 conversions

Mon Jun 15 11:34:44 PDT 2015

Hi Simon,

I am not sure this is the best way to fix this issue.
In particular, I wonder if there is an alternative approach that doesn't involve adding a new target opcode.

At least, on AVX, you can have a canonicalization rule that converts the following dag node sequence:

  v4i32: A = ...
  v2i32: B  = extract_subvector A, 0
  v2f64: C = sint_to_fp B

into:

  v4i32: A = ...
  v4f64: B  = sint_to_fp A
  v2f64: C = extract_subvector B, 0

Then, I think you can add a ISel pattern to match a VCVTDQ2PDrr from a:

  (v2f64 (extract_subvector (v4f64 (sint_to_fp v2f64:%V ), 0).

Unfortunately, the combine rule above would not fix the problem for non-AVX targets.
On those targets you will end up with a dag that looks like this:
 v2f64 =  build_vector (f64 (sint_to_fp i32:A)), (f64 (sint_to_fp i32:B))

Where:

  A: i32 = extract_vector_elt %InVec, i64 0
  B: i32 = extract_vector_elt %InVec, i64 1

I am not sure if this would be a good approach, but I think one way to fix this is to add a (quite long) ISel pattern to match that sequence and select a VCVTDQ2PDrr.

I hope it helps.
Andrea


REPOSITORY
  rL LLVM

================
Comment at: test/CodeGen/X86/vec_int_to_fp.ll:11
@@ -10,3 +10,3 @@
 ; SSE2:       # BB#0:
 ; SSE2-NEXT:    movd %xmm0, %rax
 ; SSE2-NEXT:    cvtsi2sdq %rax, %xmm1
----------------
I know that this is unrelated to your patch, but I noticed that on SSE2, this 'i64 extract element has been expanded to 'movd'. Shouldn't this be a 'movq' instead?

================
Comment at: test/CodeGen/X86/vec_int_to_fp.ll:14
@@ -13,3 +13,3 @@
 ; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
 ; SSE2-NEXT:    movd %xmm0, %rax
 ; SSE2-NEXT:    xorps %xmm0, %xmm0
----------------
Same as above.
Although this is unrelated to your patch, I think this should be 'movq'. Otherwise, we end up losing the upper half of the quadword in input.

================
Comment at: test/CodeGen/X86/vec_int_to_fp.ll:26
@@ -25,3 +25,3 @@
 ; AVX-NEXT:    vmovq %xmm0, %rax
 ; AVX-NEXT:    vxorps %xmm0, %xmm0, %xmm0
 ; AVX-NEXT:    vcvtsi2sdq %rax, %xmm0, %xmm0
----------------
Again, this is unrelated to your patch but
this vxorps seems redundant. I haven't looked at the code, but I suspect that this may be caused by a sub-optimal build_vector lowering.

================
Comment at: test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll:30-31
@@ -29,3 +29,4 @@
 ; CHECK-LABEL: foo1:
 ;   FIXME: The operation gets scalarized. If/when the compiler learns to better
 ;          use [V]CVTDQ2PD, this will need updated.
+; CHECK: cvtdq2pd
----------------
You can get rid of that FIXME since you fixed it with this patch :-)

http://reviews.llvm.org/D10433

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/