[PATCH] D52528: [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.

Sat Sep 29 12:14:15 PDT 2018

spatel added inline comments.

================
Comment at: test/CodeGen/X86/vec_extract-avx.ll:174
 ; X32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; X32-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; X32-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
+; X32-NEXT:    vxorps %xmm1, %xmm1, %xmm1
----------------
craig.topper wrote:
> This regression is because DAGCombiner::visitEXTRACT_ELEMENT explicitly avoids splitting a load until after op legalization. So we form a shuffle first and then we can't recover.
> 
> I just checked to see if InstCombine would let this sequence through in the first place and it looks like it will widen the 2f32 to v8f32 and then shuffle the single element into place. Same as what was DAGCombine did. This seems not great. Why aren't we recognizing that we don't need the other elements of the v2f32 load?
Would there be codegen problems if we always scalarize an extractelement of a vector load with no other uses in instcombine?

```
define float @load_extract(<4 x float>* %p) {
  %v = load <4 x float>, <4 x float>* %p
  %s = extractelement <4 x float> %v, i32 0
  ret float %s
}

```
-->
```
define float @load_extract(<4 x float>* %p) {
  %bc = bitcast <4 x float>* %p to float*
  %s = load float, float* %bc
  ret float %s
}

```
This would require an address offset (gep) in the general case.

Repository:
  rL LLVM

https://reviews.llvm.org/D52528