[PATCH] [X86][SSE] Vectorize v2i32 to v2f64 conversions
llvm-dev at redking.me.uk
Mon Jun 15 14:49:43 PDT 2015
Thanks guys for the reviews.
Andrea - I did investigate TableGen / ISel pattern approaches but couldn't manage to make anything work - it doesn't like the fact that v2i32 isn't valid. I believe its why the X86ISD VFPEXT and VFPROUND node types were added in a similar manner. I considered giving the node a more general name 'SINT_TO_FPEXT' but given that cvtdq2pd appears to be the only user of this pattern it didn't seem necessary.
Quentin - yes the MMX instructions (CVTPI2PD and CVTPI2PS) could work in a similar fashion. Are we actively adding MMX/3DNow! lowering? I always thought they were just hidden behind their builtin intrinsics.
I'll add the vectorization costs (+ a test) as part of the submission (or possibly as a followup).
Comment at: test/CodeGen/X86/vec_int_to_fp.ll:11
@@ -10,3 +10,3 @@
; SSE2: # BB#0:
; SSE2-NEXT: movd %xmm0, %rax
; SSE2-NEXT: cvtsi2sdq %rax, %xmm1
> I know that this is unrelated to your patch, but I noticed that on SSE2, this 'i64 extract element has been expanded to 'movd'. Shouldn't this be a 'movq' instead?
This has come up before - I was sure we made a bugzilla for this but can't find it (Sanjay can you remember?).
Comment at: test/CodeGen/X86/vec_int_to_fp.ll:26
@@ -25,3 +25,3 @@
; AVX-NEXT: vmovq %xmm0, %rax
; AVX-NEXT: vxorps %xmm0, %xmm0, %xmm0
; AVX-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0
> Again, this is unrelated to your patch but
> this vxorps seems redundant. I haven't looked at the code, but I suspect that this may be caused by a sub-optimal build_vector lowering.
I'll see if I can find out what's causing it - odd that xmm0 had just been used in exactly the same type of instruction without clearing the upper bits.
More information about the llvm-commits