[llvm-commits] [patch] Enable the generation of vaddl-type instructions from extloads on ARM

Mon Oct 1 08:09:32 PDT 2012

Ping.

> -----Original Message-----
> From: Silviu Baranga [mailto:silbar01 at arm.com]
> Sent: 20 September 2012 15:48
> To: 'Jim Grosbach'
> Cc: llvm-commits at cs.uiuc.edu
> Subject: RE: [llvm-commits] [patch] Enable the generation of vaddl-type
> instructions from extloads on ARM
> 
> Hi Jim,
> 
> Thanks for the comments! I am attaching a new patch with the
> changes applied.
> 
> To make the code target-independent I've added the check that
> the 'middleType' is legal, which is equivalent to the previous
> ARM specific check. I didn't check if the operation is legal,
> since if the code got there it is marked as 'Expand' anyway.
> 
> Cheers,
> Silviu
> 
> 
> > -----Original Message-----
> > From: Jim Grosbach [mailto:grosbach at apple.com]
> > Sent: 19 September 2012 17:56
> > To: Silviu Baranga
> > Cc: llvm-commits at cs.uiuc.edu
> > Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-
> type
> > instructions from extloads on ARM
> >
> > Hi Silviu,
> >
> > Making good progress! A few comments on the updated patch inline.
> >
> > > Index: test/CodeGen/ARM/vector-extend-narrow.ll
> > > ===================================================================
> > > --- test/CodeGen/ARM/vector-extend-narrow.ll	(revision 164117)
> > > +++ test/CodeGen/ARM/vector-extend-narrow.ll	(working copy)
> > > @@ -73,3 +73,68 @@
> > >    ret <4 x i32> %2
> > >  }
> > >
> > > +; CHECK: k:
> > > +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
> > > +  ; CHECK: vld1.32
> > > +  ; CHECK: vmovl.u8
> > > +  ; CHECK: vaddw.u16
> > > +  %1 = load <4 x i8>* %in, align 4
> > > +  %2 = zext <4 x i8> %1 to <4 x i32>
> > > +  %3 = add <4 x i32> %2, %x
> > > +  ret <4 x i32> %3
> > > +}
> > > +; CHECK: l:
> > > +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
> > > +  ; CHECK: vld1.16
> > > +  ; CHECK: vmovl.u8
> > > +  ; CHECK: vmovl.u16
> > > +  ; CHECK: vaddw.u32
> > > +  %1 = load <2 x i8>* %in, align 4
> > > +  %2 = zext <2 x i8> %1 to <2 x i64>
> > > +  %3 = add <2 x i64> %2, %x
> > > +  ret <2 x i64> %3
> > > +}
> > > +
> > > +; CHECK: m:
> > > +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
> > > +  ; CHECK: vld1.32
> > > +  ; CHECK: vmovl.u16
> > > +  ; CHECK: vaddw.u32
> > > +  %1 = load <2 x i16>* %in, align 4
> > > +  %2 = zext <2 x i16> %1 to <2 x i64>
> > > +  %3 = add <2 x i64> %2, %x
> > > +  ret <2 x i64> %3
> > > +}
> > > +
> > > +; CHECK: n:
> > > +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
> > > +  ; CHECK: vld1.32
> > > +  ; CHECK: vmovl.s8
> > > +  ; CHECK: vaddw.s16
> > > +  %1 = load <4 x i8>* %in, align 4
> > > +  %2 = sext <4 x i8> %1 to <4 x i32>
> > > +  %3 = add <4 x i32> %2, %x
> > > +  ret <4 x i32> %3
> > > +}
> > > +; CHECK: o:
> > > +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
> > > +  ; CHECK: vld1.16
> > > +  ; CHECK: vmovl.s8
> > > +  ; CHECK: vmovl.s16
> > > +  ; CHECK: vaddw.s32
> > > +  %1 = load <2 x i8>* %in, align 4
> > > +  %2 = sext <2 x i8> %1 to <2 x i64>
> > > +  %3 = add <2 x i64> %2, %x
> > > +  ret <2 x i64> %3
> > > +}
> > > +
> > > +; CHECK: p:
> > > +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
> > > +  ; CHECK: vld1.32
> > > +  ; CHECK: vmovl.s16
> > > +  ; CHECK: vaddw.s32
> > > +  %1 = load <2 x i16>* %in, align 4
> > > +  %2 = sext <2 x i16> %1 to <2 x i64>
> > > +  %3 = add <2 x i64> %2, %x
> > > +  ret <2 x i64> %3
> > > +}
> > > Index: lib/Target/ARM/ARMISelLowering.cpp
> > > ===================================================================
> > > --- lib/Target/ARM/ARMISelLowering.cpp	(revision 164117)
> > > +++ lib/Target/ARM/ARMISelLowering.cpp	(working copy)
> > > @@ -565,6 +570,12 @@
> > >        setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
> > >        setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
> > >      }
> > > +
> > > +    for (unsigned i = 0; i < 6; ++i) {
> > > +      setOperationAction(ISD::ZERO_EXTEND, Tys[i], Expand);
> > > +      setOperationAction(ISD::SIGN_EXTEND, Tys[i], Expand);
> > > +      setOperationAction(ISD::ANY_EXTEND, Tys[i], Expand);
> > > +    }
> > >    }
> >
> > I'd have sworn I commented about this bit before, but I can't find it
> > in my email logs.
> >
> > Anyways, This, and the loop above, should be moved out to a helper
> > function and the loop removed. That is, the helper should perform the
> > setOperationAction() calls for a single type, then the top level
> > function just calls that helper 6 times, once for each MVT. The NEON
> > bits for setting legal types do something similar.
> >
> >
> > >
> > >    // ARM and Thumb2 support UMLAL/SMLAL.
> > > Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> > > ===================================================================
> > > --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 164117)
> > > +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
> > > @@ -4342,7 +4342,7 @@
> > >    // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
> > >    // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
> > >    if ((ISD::isSEXTLoad(N0.getNode()) ||
> > ISD::isEXTLoad(N0.getNode())) &&
> > > -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> > > +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> > !VT.isVector()) {
> > >      LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> > >      EVT MemVT = LN0->getMemoryVT();
> > >      if ((!LegalOperations && !LN0->isVolatile()) ||
> > > @@ -4678,7 +4678,7 @@
> > >    // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
> > >    // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
> > >    if ((ISD::isZEXTLoad(N0.getNode()) ||
> > ISD::isEXTLoad(N0.getNode())) &&
> > > -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> > > +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> > !VT.isVector()) {
> > >      LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> > >      EVT MemVT = LN0->getMemoryVT();
> > >      if ((!LegalOperations && !LN0->isVolatile()) ||
> > > Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
> > > ===================================================================
> > > --- lib/CodeGen/SelectionDAG/LegalizeTypes.h	(revision 164117)
> > > +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h	(working copy)
> > > @@ -287,6 +287,8 @@
> > >    SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
> > >    SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
> > >
> > > +  SDValue ExpandExtend(SDNode *N);
> > > +
> > >    void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
> ISD::CondCode
> > Code);
> > >
> > >    //===-----------------------------------------------------------
> --
> > -------===//
> > > Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> > > ===================================================================
> > > --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(revision
> > 164117)
> > > +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(working
> > copy)
> > > @@ -732,7 +732,44 @@
> > >  //===-------------------------------------------------------------
> --
> > -------===//
> > >  //  Integer Operand Promotion
> > >  //===-------------------------------------------------------------
> --
> > -------===//
> > > +SDValue DAGTypeLegalizer::ExpandExtend(SDNode *N) {
> >
> > This function really needs a comment explaining its purpose and
> perhaps
> > a different name. It's not really doing a full expand of the extend,
> > but rather doing some housekeeping beforehand to make sure the
> > expansion that results from the Promote*() call will be reasonable
> for
> > the target. In any case, some comments about the motivation and logic
> > would be very helpful.
> >
> > > +  EVT VT0 = N->getValueType(0);
> > > +  EVT VT1 = N->getOperand(0).getValueType();
> > > +  DebugLoc dl = N->getDebugLoc();
> > > +  // We only split the node if the operation is not legal,
> otherwise
> > we could get
> > > +  // a legalization loop.
> > > +  if (TLI.getOperationAction(N->getOpcode(), VT1) ==
> > TargetLowering::Legal)
> > > +    return SDValue();
> > >
> > > +  if (!VT0.isVector())
> > > +    return SDValue();
> > > +
> > > +  // We don't want expand if the intermediate type is the v2i16,
> > > +  // since v2i16 is not a legal type. This only happens when the
> > > +  // destination has the v2i32 type and the source is v2i8.
> > > +  if (VT1 == MVT::v2i8 && VT0 == MVT::v2i32)
> > > +    return SDValue();
> >
> > This is in generic code now, so we can't make assumptions about what
> > types are legal. We need to ask the target. This function needs to be
> > made more generic, basically. For example, here we assume that v2i16
> > isn't legal. That's true for ARM, but quite possibly untrue for other
> > targets that support 32-bit vector types. Likewise, we may need to
> > check for which operations are legal as well as which types.
> >
> > > +  // If there is already an instruction that could be directly
> > > +  // selected for this there is no reason to expand since it will
> > > +  // generate a redundant expand instruction (from type X to type
> > X).
> > > +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> > > +    return SDValue();
> > > +
> > > +  LLVMContext *Context = DAG.getContext();
> > > +
> > > +  // We need to get the intermediate type. This type will be twice
> > as
> > > +  // small as the destination type, since all expand instructions
> > are
> > > +  // from a length X to a length 2*X and we have expand
> instructions
> > > +  // for all destination types.
> > > +  EVT middleType = EVT::getVectorVT(*Context,
> > > +                          EVT::getIntegerVT(*Context,
> > > +
> VT0.getScalarType().getSizeInBits()
> > / 2),
> > > +                          VT0.getVectorNumElements());
> > > +  return DAG.getNode(N->getOpcode(), dl, VT0,
> > > +                     DAG.getNode(N->getOpcode(), dl,
> > > +                                 middleType, N->getOperand(0)));
> > > +}
> > > +
> > >  /// PromoteIntegerOperand - This method is called when the
> specified
> > operand of
> > >  /// the specified node is found to need promotion.  At this point,
> > all of the
> > >  /// result types of the node are known to be legal, but other
> > operands of the
> > > @@ -752,7 +789,10 @@
> > >    #endif
> > >      llvm_unreachable("Do not know how to promote this operator's
> > operand!");
> > >
> > > -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N); break;
> > > +  case ISD::ANY_EXTEND:
> > > +    Res = ExpandExtend(N);
> > > +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
> > > +    break;
> > >    case ISD::ATOMIC_STORE:
> > >      Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
> > >      break;
> > > @@ -774,15 +814,20 @@
> > >    case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo);
> break;
> > >    case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N, OpNo);
> > break;
> > >    case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo);
> break;
> > > -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N);
> break;
> > > +  case ISD::SIGN_EXTEND:
> > > +    Res = ExpandExtend(N);
> > > +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
> > > +    break;
> > >    case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N); break;
> > >    case ISD::STORE:        Res =
> > PromoteIntOp_STORE(cast<StoreSDNode>(N),
> > >                                                     OpNo); break;
> > >    case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N); break;
> > >    case ISD::FP16_TO_FP32:
> > >    case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N); break;
> > > -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N);
> break;
> > > -
> > > +  case ISD::ZERO_EXTEND:
> > > +    Res = ExpandExtend(N);
> > > +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
> > > +    break;
> > >    case ISD::SHL:
> > >    case ISD::SRA:
> > >    case ISD::SRL:
> > > Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
> > > ===================================================================
> > > --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 164117)
> > > +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
> > > @@ -2565,7 +2565,8 @@
> > >              VT.getVectorNumElements() ==
> > >              Operand.getValueType().getVectorNumElements()) &&
> > >             "Vector element count mismatch!");
> > > -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> > ISD::ZERO_EXTEND)
> > > +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> > ISD::ZERO_EXTEND) &&
> > > +       TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> > >        return getNode(OpOpcode, DL, VT, Operand.getNode()-
> > >getOperand(0));
> > >      else if (OpOpcode == ISD::UNDEF)
> > >        // sext(undef) = 0, because the top bits will all be the
> same.
> > > @@ -2581,7 +2582,9 @@
> > >              VT.getVectorNumElements() ==
> > >              Operand.getValueType().getVectorNumElements()) &&
> > >             "Vector element count mismatch!");
> > > -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) ->
> (zext
> > x)
> > > +    if (OpOpcode == ISD::ZERO_EXTEND &&
> > > +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> > > +      // (zext (zext x)) -> (zext x)
> > >        return getNode(ISD::ZERO_EXTEND, DL, VT,
> > >                       Operand.getNode()->getOperand(0));
> > >      else if (OpOpcode == ISD::UNDEF)
> > > @@ -2599,8 +2602,9 @@
> > >              Operand.getValueType().getVectorNumElements()) &&
> > >             "Vector element count mismatch!");
> > >
> > > -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> ISD::SIGN_EXTEND
> > ||
> > > -        OpOpcode == ISD::ANY_EXTEND)
> > > +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> > ISD::SIGN_EXTEND ||
> > > +        OpOpcode == ISD::ANY_EXTEND) &&
> > > +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> > >        // (ext (zext x)) -> (zext x)  and  (ext (sext x)) -> (sext
> x)
> > >        return getNode(OpOpcode, DL, VT, Operand.getNode()-
> > >getOperand(0));
> > >      else if (OpOpcode == ISD::UNDEF)
> >
> >
> >
> >
> > On Sep 19, 2012, at 3:47 AM, Silviu Baranga <silbar01 at arm.com> wrote:
> >
> > > Hi Jim,
> > >
> > > Thanks for the review. It does make sense to do this in a target
> > > independent way. I've attached a new patch that does this.
> > >
> > > I've moved the code ARM-specific code into the legalizer and also
> > > added tests for sext nodes. It appears that I was mistaken before
> > > and (sext (sextload)) patterns were not prevented from folding for
> > > vector types. This was stopping the generation of vaddl-type
> > > instructions. I think that stopping the folding of (sext
> (sextload))
> > > patterns is still OK, because the merging would result in a illegal
> > > operation.
> > >
> > > Cheers,
> > > Silviu
> > >
> > >> -----Original Message-----
> > >> From: Jim Grosbach [mailto:grosbach at apple.com]
> > >> Sent: 17 September 2012 18:09
> > >> To: Silviu Baranga
> > >> Cc: llvm-commits at cs.uiuc.edu
> > >> Subject: Re: [llvm-commits] [patch] Enable the generation of
> vaddl-
> > type
> > >> instructions from extloads on ARM
> > >>
> > >> Hi Silviu,
> > >>
> > >> Thanks for the update. This is definitely much better. Why do we
> > need
> > >> any ARM specific code here, though? Given the problem description,
> > it
> > >> seems like this is something the legalizer should be able to
> reason
> > >> about without needing custom handling in the target.
> > >>
> > >> -Jim
> > >>
> > >>
> > >> On Sep 6, 2012, at 9:34 AM, Silviu Baranga <silbar01 at arm.com>
> wrote:
> > >>
> > >>> Hi Jim,
> > >>>
> > >>> Thanks for the review. The BREAK node solution was a bit ugly.
> > >>>
> > >>> I've attached a new patch that solves this without a BREAK node.
> > >>> This is done by stopping the folding of nodes zext nodes when
> > >>> this will result in a non-legal operation.
> > >>>
> > >>> I also had to stop the (zext (zextload x)) folding for vector
> > >>> types. This is already being done for sext nodes.
> > >>>
> > >>> Thanks,
> > >>> Silviu
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Jim Grosbach [mailto:grosbach at apple.com]
> > >>>> Sent: 06 September 2012 01:42
> > >>>> To: Silviu Baranga
> > >>>> Cc: llvm-commits at cs.uiuc.edu
> > >>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
> > vaddl-
> > >> type
> > >>>> instructions from extloads on ARM
> > >>>>
> > >>>> Hi Silviu,
> > >>>>
> > >>>> I like the idea of teaching the compiler to use these
> > instructions,
> > >> but
> > >>>> the BREAK pseudo-instruction feels pretty gross to me. I'd much
> > >> prefer
> > >>>> we figure out how to teach the combiner and isel to be smarter
> > >> directly
> > >>>> rather than doing something like that.
> > >>>>
> > >>>> -Jim
> > >>>>
> > >>>> On Sep 4, 2012, at 1:58 AM, Silviu Baranga <silbar01 at arm.com>
> > wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> Currently LLVM will not generate a vaddl-type instructions when
> > an
> > >>>> extload
> > >>>>> node is in the input dag. All vector extload nodes are
> > implemented
> > >>>> with
> > >>>>> a load instruction and a lengthening instruction on ARM. If the
> > >>>> lengthening
> > >>>>> instruction is followed by an add instruction, it would be
> better
> > >> to
> > >>>>> generate
> > >>>>> a vaddl-type instruction instead.
> > >>>>>
> > >>>>> To do this, we split all lengthening dag nodes which extend a
> > type
> > >> to
> > >>>> more
> > >>>>> then twice its size (this is OK because these nodes would be
> > >>>> implemented by
> > >>>>> multiple instructions anyway) into two lengthening nodes. One
> > node
> > >>>> will be
> > >>>>> later
> > >>>>> combined to form the extload node, and the other can be
> selected
> > as
> > >>>> part of
> > >>>>> a
> > >>>>> vaddl-type instruction (there are already patterns for this).
> > >>>>>
> > >>>>> To stop the dag nodes from re-combining, we insert a BREAK node
> > >>>> between
> > >>>>> them. The
> > >>>>> BREAK node is used as an optimization barrier for the DAG
> > combiner,
> > >>>> and
> > >>>>> might be
> > >>>>> useful in other cases as well. We remove the BREAK node in the
> > >>>> instruction
> > >>>>> selection stage.
> > >>>>>
> > >>>>> This approach allows the reuse of existing selection patterns.
> > >>>>>
> > >>>>> This patch depends on the following other patch:
> > >>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-
> > >>>> 20120827/149201.
> > >>>>> html
> > >>>>>
> > >>>>> Please review!
> > >>>>>
> > >>>>> Thanks,
> > >>>>>
> Silviu<patch.diff>_______________________________________________
> > >>>>> llvm-commits mailing list
> > >>>>> llvm-commits at cs.uiuc.edu
> > >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >>>>
> > >>>>
> > >>> <zext_split.diff>
> > >>
> > >>
> > > <zext_split.diff>
> >
> >