[PATCH] D52528: [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.

Sat Oct 6 17:54:54 PDT 2018

craig.topper added a comment.

I believe the only regressions caused by this are already issues in 64-bit mode. Are we concerned about 32-bit mode regressions here? Or can we take this and try to improve these issues as a follow up?

================
Comment at: test/CodeGen/X86/vec_extract-avx.ll:174
 ; X32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; X32-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; X32-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
+; X32-NEXT:    vxorps %xmm1, %xmm1, %xmm1
----------------
spatel wrote:
> craig.topper wrote:
> > This regression is because DAGCombiner::visitEXTRACT_ELEMENT explicitly avoids splitting a load until after op legalization. So we form a shuffle first and then we can't recover.
> > 
> > I just checked to see if InstCombine would let this sequence through in the first place and it looks like it will widen the 2f32 to v8f32 and then shuffle the single element into place. Same as what was DAGCombine did. This seems not great. Why aren't we recognizing that we don't need the other elements of the v2f32 load?
> Would there be codegen problems if we always scalarize an extractelement of a vector load with no other uses in instcombine?
> 
> ```
> define float @load_extract(<4 x float>* %p) {
>   %v = load <4 x float>, <4 x float>* %p
>   %s = extractelement <4 x float> %v, i32 0
>   ret float %s
> }
> 
> ```
> -->
> ```
> define float @load_extract(<4 x float>* %p) {
>   %bc = bitcast <4 x float>* %p to float*
>   %s = load float, float* %bc
>   ret float %s
> }
> 
> ```
> This would require an address offset (gep) in the general case.
I'm not sure.

Repository:
  rL LLVM

https://reviews.llvm.org/D52528