[PATCH] D109065: [X86] combineX86ShufflesRecursively(): call SimplifyMultipleUseDemandedVectorElts() on after finishing recursing

Sun Sep 19 07:19:38 PDT 2021

lebedev.ri marked 2 inline comments as done.
lebedev.ri added a comment.

In D109065#3008188 <https://reviews.llvm.org/D109065#3008188>, @RKSimon wrote:

> LGTM

:/ 
Thank you for the review!

================
Comment at: llvm/test/CodeGen/X86/oddshuffles.ll:2268
+; AVX1-NEXT:    vblendps {{.*#+}} ymm0 = ymm1[0],ymm0[1],ymm1[2,3,4,5,6,7]
+; AVX1-NEXT:    vmovd %eax, %xmm2
+; AVX1-NEXT:    vpshufd {{.*#+}} xmm2 = xmm2[0,0,0,0]
----------------
RKSimon wrote:
> lebedev.ri wrote:
> > lebedev.ri wrote:
> > > lebedev.ri wrote:
> > > > RKSimon wrote:
> > > > > Looks like we're missing a fold to share scalar_to_vector(x) and scalar_to_vector(trunc(x)) (maybe worth supporting scalar_to_vector(ext(x)) as well)?
> > > > This stuff is broken.
> > > > We've in AVX1-more, and only have broadcast-from-mem,
> > > > yet we've successfully obscured the load via the truncation.
> > > > 
> > > > I suppose we could look past ext/trunc of scalar_to_vector operand,
> > > > and change it to bitcast/ext of scalar_to_vector itself,
> > > > let me see.
> > > > 
> > > > ```
> > > > Optimized legalized selection DAG: %bb.0 'splat_v3i32:'
> > > > SelectionDAG has 28 nodes:
> > > >   t0: ch = EntryToken
> > > >     t2: i64,ch = CopyFromReg t0, Register:i64 %0
> > > >   t27: i64,ch = load<(load (s64) from %ir.ptr, align 1)> t0, t2, undef:i64
> > > >       t24: v8i32 = BUILD_VECTOR Constant:i32<0>, undef:i32, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
> > > >           t30: v2i64 = scalar_to_vector t27
> > > >         t107: v4i64 = insert_subvector undef:v4i64, t30, Constant:i64<0>
> > > >       t108: v8i32 = bitcast t107
> > > >     t101: v8i32 = X86ISD::BLENDI t24, t108, TargetConstant:i8<2>
> > > >   t19: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t101
> > > >       t25: v8i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, undef:i32, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
> > > >           t91: i32 = truncate t27
> > > >         t92: v4i32 = X86ISD::VBROADCAST t91
> > > >       t94: v8i32 = insert_subvector undef:v8i32, t92, Constant:i64<0>
> > > >     t97: v8i32 = X86ISD::BLENDI t25, t94, TargetConstant:i8<4>
> > > >   t21: ch,glue = CopyToReg t19, Register:v8i32 $ymm1, t97, t19:1
> > > >   t22: ch = X86ISD::RET_FLAG t21, TargetConstant:i32<0>, Register:v8i32 $ymm0, Register:v8i32 $ymm1, t21:1
> > > > ```
> > > While something like this should work:
> > > ```
> > > diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> > > index 5a49f33e46fe..4d7c2c2a8651 100644
> > > --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> > > +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> > > @@ -21824,6 +21824,13 @@ SDValue DAGCombiner::visitSCALAR_TO_VECTOR(SDNode *N) {
> > >      }
> > >    }
> > >  
> > > +  // Fold SCALAR_TO_VECTOR(TRUNCATE(V)) to SCALAR_TO_VECTOR(V),
> > > +  // by making trucation of the operand implicit.
> > > +  if (InVal.getOpcode() == ISD::TRUNCATE && VT.isFixedLengthVector() &&
> > > +      Level < AfterLegalizeDAG)
> > > +    return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N), VT,
> > > +                       InVal->getOperand(0));
> > > +
> > >    return SDValue();
> > >  }
> > >  
> > > diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
> > > index 09ba7af6e38a..695cc8303cc1 100644
> > > --- a/llvm/lib/Target/X86/X86ISelLowering.cpp
> > > +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
> > > @@ -14076,10 +14076,12 @@ static SDValue lowerShuffleAsBroadcast(const SDLoc &DL, MVT VT, SDValue V1,
> > >  
> > >    // If this is a scalar, do the broadcast on this type and bitcast.
> > >    if (!V.getValueType().isVector()) {
> > > -    assert(V.getScalarValueSizeInBits() == NumEltBits &&
> > > -           "Unexpected scalar size");
> > > -    MVT BroadcastVT = MVT::getVectorVT(V.getSimpleValueType(),
> > > -                                       VT.getVectorNumElements());
> > > +    if(V.getValueType().isInteger() &&
> > > +       V.getScalarValueSizeInBits() > NumEltBits)
> > > +      V = DAG.getNode(ISD::TRUNCATE, DL, VT.getScalarType(), V);
> > > +    assert(V.getScalarValueSizeInBits() == NumEltBits && "Unexpected scalar size");
> > > +    MVT BroadcastVT =
> > > +        MVT::getVectorVT(V.getSimpleValueType(), VT.getVectorNumElements());
> > >      return DAG.getBitcast(VT, DAG.getNode(Opcode, DL, BroadcastVT, V));
> > >    }
> > > ```
> > > it doesn't catch anything with the cut-off,
> > > and without it. it exposes numerous places that don't expect this truncation to be implicit.
> > > 
> > I've looked again, and i'm not sure i have enough motivation to tackle all the fallout from the
> > `scalar_to_vector(trunc(x)) --> scalar_to_vector(x)` fold, unless i misunderstood the suggestion.
> That's OK - I'll take a look when I get the chance.
Sorry :/

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109065/new/

https://reviews.llvm.org/D109065