[PATCH] D52528: [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 11 04:19:33 PDT 2018
RKSimon accepted this revision.
RKSimon added a comment.
This revision is now accepted and ready to land.
LGTM as long as all the regressions are documented somewhere so we don't lose track
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:896
+ setOperationAction(ISD::LOAD, MVT::v2f32, Custom);
+
----------------
Please can you align the arguments into the columns </pedantic>
================
Comment at: test/CodeGen/X86/vec_extract-avx.ll:174
; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X32-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; X32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
+; X32-NEXT: vxorps %xmm1, %xmm1, %xmm1
----------------
craig.topper wrote:
> spatel wrote:
> > craig.topper wrote:
> > > This regression is because DAGCombiner::visitEXTRACT_ELEMENT explicitly avoids splitting a load until after op legalization. So we form a shuffle first and then we can't recover.
> > >
> > > I just checked to see if InstCombine would let this sequence through in the first place and it looks like it will widen the 2f32 to v8f32 and then shuffle the single element into place. Same as what was DAGCombine did. This seems not great. Why aren't we recognizing that we don't need the other elements of the v2f32 load?
> > Would there be codegen problems if we always scalarize an extractelement of a vector load with no other uses in instcombine?
> >
> > ```
> > define float @load_extract(<4 x float>* %p) {
> > %v = load <4 x float>, <4 x float>* %p
> > %s = extractelement <4 x float> %v, i32 0
> > ret float %s
> > }
> >
> > ```
> > -->
> > ```
> > define float @load_extract(<4 x float>* %p) {
> > %bc = bitcast <4 x float>* %p to float*
> > %s = load float, float* %bc
> > ret float %s
> > }
> >
> > ```
> > This would require an address offset (gep) in the general case.
> I'm not sure.
A mixture of XFormVExtractWithShuffleIntoLoad and EltsFromConsecutiveLoads would probably help here.
Repository:
rL LLVM
https://reviews.llvm.org/D52528
More information about the llvm-commits
mailing list