[PATCH] D52528: [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.

Sat Sep 29 09:38:37 PDT 2018

craig.topper added inline comments.

================
Comment at: test/CodeGen/X86/bitcast-int-to-vector.ll:20
 ; X86-SSE:       # %bb.0:
-; X86-SSE-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-SSE-NEXT:    ucomiss {{[0-9]+}}(%esp), %xmm0
+; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
+; X86-SSE-NEXT:    movaps %xmm0, %xmm1
----------------
For this case we would need to decide if it makes sense to split the load into 2 scalar loads when both elements are extracted separately.

================
Comment at: test/CodeGen/X86/vec_extract-avx.ll:174
 ; X32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; X32-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; X32-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
+; X32-NEXT:    vxorps %xmm1, %xmm1, %xmm1
----------------
This regression is because DAGCombiner::visitEXTRACT_ELEMENT explicitly avoids splitting a load until after op legalization. So we form a shuffle first and then we can't recover.

I just checked to see if InstCombine would let this sequence through in the first place and it looks like it will widen the 2f32 to v8f32 and then shuffle the single element into place. Same as what was DAGCombine did. This seems not great. Why aren't we recognizing that we don't need the other elements of the v2f32 load?

Repository:
  rL LLVM

https://reviews.llvm.org/D52528