[PATCH] [X86][SSE] Vectorize v2i32 to v2f64 conversions
Andrea Di Biagio
Andrea_DiBiagio at sn.scee.net
Mon Jun 15 11:34:44 PDT 2015
Hi Simon,
I am not sure this is the best way to fix this issue.
In particular, I wonder if there is an alternative approach that doesn't involve adding a new target opcode.
At least, on AVX, you can have a canonicalization rule that converts the following dag node sequence:
v4i32: A = ...
v2i32: B = extract_subvector A, 0
v2f64: C = sint_to_fp B
into:
v4i32: A = ...
v4f64: B = sint_to_fp A
v2f64: C = extract_subvector B, 0
Then, I think you can add a ISel pattern to match a VCVTDQ2PDrr from a:
(v2f64 (extract_subvector (v4f64 (sint_to_fp v2f64:%V ), 0).
Unfortunately, the combine rule above would not fix the problem for non-AVX targets.
On those targets you will end up with a dag that looks like this:
v2f64 = build_vector (f64 (sint_to_fp i32:A)), (f64 (sint_to_fp i32:B))
Where:
A: i32 = extract_vector_elt %InVec, i64 0
B: i32 = extract_vector_elt %InVec, i64 1
I am not sure if this would be a good approach, but I think one way to fix this is to add a (quite long) ISel pattern to match that sequence and select a VCVTDQ2PDrr.
I hope it helps.
Andrea
REPOSITORY
rL LLVM
================
Comment at: test/CodeGen/X86/vec_int_to_fp.ll:11
@@ -10,3 +10,3 @@
; SSE2: # BB#0:
; SSE2-NEXT: movd %xmm0, %rax
; SSE2-NEXT: cvtsi2sdq %rax, %xmm1
----------------
I know that this is unrelated to your patch, but I noticed that on SSE2, this 'i64 extract element has been expanded to 'movd'. Shouldn't this be a 'movq' instead?
================
Comment at: test/CodeGen/X86/vec_int_to_fp.ll:14
@@ -13,3 +13,3 @@
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
; SSE2-NEXT: movd %xmm0, %rax
; SSE2-NEXT: xorps %xmm0, %xmm0
----------------
Same as above.
Although this is unrelated to your patch, I think this should be 'movq'. Otherwise, we end up losing the upper half of the quadword in input.
================
Comment at: test/CodeGen/X86/vec_int_to_fp.ll:26
@@ -25,3 +25,3 @@
; AVX-NEXT: vmovq %xmm0, %rax
; AVX-NEXT: vxorps %xmm0, %xmm0, %xmm0
; AVX-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0
----------------
Again, this is unrelated to your patch but
this vxorps seems redundant. I haven't looked at the code, but I suspect that this may be caused by a sub-optimal build_vector lowering.
================
Comment at: test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll:30-31
@@ -29,3 +29,4 @@
; CHECK-LABEL: foo1:
; FIXME: The operation gets scalarized. If/when the compiler learns to better
; use [V]CVTDQ2PD, this will need updated.
+; CHECK: cvtdq2pd
----------------
You can get rid of that FIXME since you fixed it with this patch :-)
http://reviews.llvm.org/D10433
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list