[llvm-commits] [patch] Enable the generation of vaddl-type instructions from extloads on ARM

Mon Oct 8 08:32:34 PDT 2012

Hi Jim, Evan,

Jim - Thanks for the comments. Some answers:

> > +
> > +  // If there is already an instruction that could be directly
> > +  // selected for this there is no reason to expand since it will
> > +  // generate a redundant expand instruction (from type X to type
> X).
> > +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> > +    return SDValue();
> 
> I don't follow this. It seems pretty arbitrary. The comment doesn't
> help much, since how do we know what instructions could be selected or
> not? Can you clarify?
If VT1.getSizeInBits() * 2) is greater or equal to VT0.getSizeInBits(), the 
middleType will have a size smaller or equal to the type that we are
extending from and it doesn't make sense to continue (it doesn't make
sense to extend from a larger type to a smaller type).
I'll make sure to add more detailed comments.

> > +
> > +  // We need to get the intermediate type. This type will be twice
> as
> > +  // small as the destination type,
> 
> s/twice as small as/half the size of/
> 
> > since all expand instructions are
> > +  // from a length X to a length 2*X and we have expand instructions
> > +  // for all destination types.
> 
> Not sure I understand what you mean by "we have expand instructions for
> all destination types." Can you clarify?
The "we have expand instructions for all destination types" comment
should be removed. I think it remained from a previous versions of the
patch.

Evan - Is this approach a good idea in general?

Thanks,
Silviu

> -----Original Message-----
> From: Jim Grosbach [mailto:grosbach at apple.com]
> Sent: 02 October 2012 21:54
> To: Silviu Baranga
> Cc: llvm-commits at cs.uiuc.edu for LLVM; Evan Cheng
> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-type
> instructions from extloads on ARM
> 
> Hi Silviu,
> 
> Specific comments below.
> 
> Evan, you're more expert than I in this part of the code. There's
> something about this whole thing that feels a bit funny, but I can't
> put my finger on it effectively enough to be very helpful. Mind having
> a look? It'll probably be obvious one way or the other to you.
> 
> > Index: test/CodeGen/ARM/vector-extend-narrow.ll
> 
> General comment: thank you for making sure to include test cases. It's
> amazing how many patches don't do that...
> 
> > ===================================================================
> > --- test/CodeGen/ARM/vector-extend-narrow.ll	(revision 164117)
> > +++ test/CodeGen/ARM/vector-extend-narrow.ll	(working copy)
> > @@ -73,3 +73,68 @@
> >    ret <4 x i32> %2
> >  }
> >
> > +; CHECK: k:
> > +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
> > +  ; CHECK: vld1.32
> > +  ; CHECK: vmovl.u8
> > +  ; CHECK: vaddw.u16
> > +  %1 = load <4 x i8>* %in, align 4
> > +  %2 = zext <4 x i8> %1 to <4 x i32>
> > +  %3 = add <4 x i32> %2, %x
> > +  ret <4 x i32> %3
> > +}
> > +; CHECK: l:
> > +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
> > +  ; CHECK: vld1.16
> > +  ; CHECK: vmovl.u8
> > +  ; CHECK: vmovl.u16
> > +  ; CHECK: vaddw.u32
> > +  %1 = load <2 x i8>* %in, align 4
> > +  %2 = zext <2 x i8> %1 to <2 x i64>
> > +  %3 = add <2 x i64> %2, %x
> > +  ret <2 x i64> %3
> > +}
> > +
> > +; CHECK: m:
> > +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
> > +  ; CHECK: vld1.32
> > +  ; CHECK: vmovl.u16
> > +  ; CHECK: vaddw.u32
> > +  %1 = load <2 x i16>* %in, align 4
> > +  %2 = zext <2 x i16> %1 to <2 x i64>
> > +  %3 = add <2 x i64> %2, %x
> > +  ret <2 x i64> %3
> > +}
> > +
> > +; CHECK: n:
> > +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
> > +  ; CHECK: vld1.32
> > +  ; CHECK: vmovl.s8
> > +  ; CHECK: vaddw.s16
> > +  %1 = load <4 x i8>* %in, align 4
> > +  %2 = sext <4 x i8> %1 to <4 x i32>
> > +  %3 = add <4 x i32> %2, %x
> > +  ret <4 x i32> %3
> > +}
> > +; CHECK: o:
> > +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
> > +  ; CHECK: vld1.16
> > +  ; CHECK: vmovl.s8
> > +  ; CHECK: vmovl.s16
> > +  ; CHECK: vaddw.s32
> > +  %1 = load <2 x i8>* %in, align 4
> > +  %2 = sext <2 x i8> %1 to <2 x i64>
> > +  %3 = add <2 x i64> %2, %x
> > +  ret <2 x i64> %3
> > +}
> > +
> > +; CHECK: p:
> > +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
> > +  ; CHECK: vld1.32
> > +  ; CHECK: vmovl.s16
> > +  ; CHECK: vaddw.s32
> > +  %1 = load <2 x i16>* %in, align 4
> > +  %2 = sext <2 x i16> %1 to <2 x i64>
> > +  %3 = add <2 x i64> %2, %x
> > +  ret <2 x i64> %3
> > +}
> > Index: lib/Target/ARM/ARMISelLowering.h
> > ===================================================================
> > --- lib/Target/ARM/ARMISelLowering.h	(revision 164117)
> > +++ lib/Target/ARM/ARMISelLowering.h	(working copy)
> > @@ -403,6 +406,8 @@
> >      void addDRTypeForNEON(MVT VT);
> >      void addQRTypeForNEON(MVT VT);
> >
> > +    void addExtendLegalizeTypeForNEON(MVT VT);
> > +
> >      typedef SmallVector<std::pair<unsigned, SDValue>, 8>
> RegsToPassVector;
> >      void PassF64ArgInRegs(DebugLoc dl, SelectionDAG &DAG,
> >                            SDValue Chain, SDValue &Arg,
> > Index: lib/Target/ARM/ARMISelLowering.cpp
> > ===================================================================
> > --- lib/Target/ARM/ARMISelLowering.cpp	(revision 164117)
> > +++ lib/Target/ARM/ARMISelLowering.cpp	(working copy)
> > @@ -158,6 +158,15 @@
> >    addTypeForNEON(VT, MVT::v2f64, MVT::v4i32);
> >  }
> >
> > +void ARMTargetLowering::addExtendLegalizeTypeForNEON(MVT VT) {
> > +      setLoadExtAction(ISD::EXTLOAD, VT, Legal);
> > +      setLoadExtAction(ISD::SEXTLOAD, VT, Legal);
> > +      setLoadExtAction(ISD::ZEXTLOAD, VT, Legal);
> > +      setOperationAction(ISD::ANY_EXTEND, VT, Expand);
> > +      setOperationAction(ISD::SIGN_EXTEND, VT, Expand);
> > +      setOperationAction(ISD::ZERO_EXTEND, VT, Expand);
> > +}
> 
> Formatting. Only two space indentation required.
> 
> > +
> >  static TargetLoweringObjectFile *createTLOF(TargetMachine &TM) {
> >    if (TM.getSubtarget<ARMSubtarget>().isTargetDarwin())
> >      return new TargetLoweringObjectFileMachO();
> > @@ -556,15 +565,17 @@
> >      setTargetDAGCombine(ISD::FP_TO_UINT);
> >      setTargetDAGCombine(ISD::FDIV);
> >
> > -    // It is legal to extload from v4i8 to v4i16 or v4i32.
> > -    MVT Tys[6] = {MVT::v8i8, MVT::v4i8, MVT::v2i8,
> > -                  MVT::v4i16, MVT::v2i16,
> > -                  MVT::v2i32};
> > -    for (unsigned i = 0; i < 6; ++i) {
> > -      setLoadExtAction(ISD::EXTLOAD, Tys[i], Legal);
> > -      setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
> > -      setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
> > -    }
> > +    // There is no instruction to expand v4i8 to v4i32 and
> > +    // we must do this with two vmovl instructions. However,
> > +    // the vmovl instructions can be combined with other
> > +    // lengthening instructions, so we would prefer to lower
> > +    // (zextload x) -> (zext (zextload x).
> > +    addExtendLegalizeTypeForNEON(MVT::v8i8);
> > +    addExtendLegalizeTypeForNEON(MVT::v4i8);
> > +    addExtendLegalizeTypeForNEON(MVT::v2i8);
> > +    addExtendLegalizeTypeForNEON(MVT::v4i16);
> > +    addExtendLegalizeTypeForNEON(MVT::v2i16);
> > +    addExtendLegalizeTypeForNEON(MVT::v2i32);
> >    }
> >
> >    // ARM and Thumb2 support UMLAL/SMLAL.
> > Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> > ===================================================================
> > --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 164117)
> > +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
> > @@ -4342,7 +4342,7 @@
> >    // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
> >    // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
> >    if ((ISD::isSEXTLoad(N0.getNode()) ||
> ISD::isEXTLoad(N0.getNode())) &&
> > -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> > +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> !VT.isVector()) {
> >      LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >      EVT MemVT = LN0->getMemoryVT();
> >      if ((!LegalOperations && !LN0->isVolatile()) ||
> > @@ -4678,7 +4678,7 @@
> >    // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
> >    // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
> >    if ((ISD::isZEXTLoad(N0.getNode()) ||
> ISD::isEXTLoad(N0.getNode())) &&
> > -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> > +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> !VT.isVector()) {
> >      LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >      EVT MemVT = LN0->getMemoryVT();
> >      if ((!LegalOperations && !LN0->isVolatile()) ||
> > Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
> > ===================================================================
> > --- lib/CodeGen/SelectionDAG/LegalizeTypes.h	(revision 164117)
> > +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h	(working copy)
> > @@ -287,6 +287,8 @@
> >    SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
> >    SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
> >
> > +  SDValue LowerExtendOp(SDNode *N);
> > +
> >    void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode
> Code);
> >
> >    //===-------------------------------------------------------------
> -------===//
> > Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> > ===================================================================
> > --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(revision
> 164117)
> > +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(working
> copy)
> > @@ -733,6 +733,51 @@
> >  //  Integer Operand Promotion
> >  //===---------------------------------------------------------------
> -------===//
> >
> > +// LowerExtendOp - This function breaks up vector zext/sext nodes so
> that the
> > +// result will more closely resemble the operations available on the
> target.
> > +// The resulting nodes can be each combined with other nodes to
> create legal
> > +// operations.
> 
> Some addition questions below, but in general, this could probably use
> a bit more description in the comment about what's actually happening
> here. Why this transform is a good idea and exactly what the transform
> is, in particular.
> 
> > +SDValue DAGTypeLegalizer::LowerExtendOp(SDNode *N) {
> 
> Is "Lower" really the right name for this function?
> 
> > +  EVT VT0 = N->getValueType(0);
> > +  EVT VT1 = N->getOperand(0).getValueType();
> > +  DebugLoc dl = N->getDebugLoc();
> > +  // We only split the node if the operation is not legal, otherwise
> we could get
> > +  // a legalization loop.
> > +  if (TLI.getOperationAction(N->getOpcode(), VT1) ==
> TargetLowering::Legal)
> 
> You probably just want TLI.isOperationLegalOrCustom() here.
> 
> > +    return SDValue();
> > +
> > +  if (!VT0.isVector())
> > +    return SDValue();
> 
> Minor note, but I'd put this check first as it's cheaper and more
> likely to fire than the check for the operation being legal.
> 
> > +
> > +  // If there is already an instruction that could be directly
> > +  // selected for this there is no reason to expand since it will
> > +  // generate a redundant expand instruction (from type X to type
> X).
> > +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> > +    return SDValue();
> 
> I don't follow this. It seems pretty arbitrary. The comment doesn't
> help much, since how do we know what instructions could be selected or
> not? Can you clarify?
> 
> > +
> > +  LLVMContext *Context = DAG.getContext();
> 
> Can just have this be a reference since that's how it's used below and
> we never can have a NULL context here.
> 
> > +
> > +  // We need to get the intermediate type. This type will be twice
> as
> > +  // small as the destination type,
> 
> s/twice as small as/half the size of/
> 
> > since all expand instructions are
> > +  // from a length X to a length 2*X and we have expand instructions
> > +  // for all destination types.
> 
> Not sure I understand what you mean by "we have expand instructions for
> all destination types." Can you clarify?
> 
> > +  EVT middleType = EVT::getVectorVT(*Context,
> 
> Variable names should start with a Capitol letter. It's just function
> names that are camel-case starting w/ lower case. That said, it's
> unclear what "middle" means here. Perhaps a clearer name or a comment?
> 
> > +                          EVT::getIntegerVT(*Context,
> > +                                VT0.getScalarType().getSizeInBits()
> / 2),
> > +                          VT0.getVectorNumElements());
> > +
> > +  // We don't want to expand if we can't use a legal intermediate
> type
> > +  // since there wouldn't be any benefit from this and we might end
> > +  // up with nodes that cannot be selected in the instruction
> selection
> > +  // stage.
> > +  if (!TLI.isTypeLegal(middleType))
> > +    return SDValue();
> > +
> > +  return DAG.getNode(N->getOpcode(), dl, VT0,
> > +                     DAG.getNode(N->getOpcode(), dl,
> > +                                 middleType, N->getOperand(0)));
> > +}
> > +
> >  /// PromoteIntegerOperand - This method is called when the specified
> operand of
> >  /// the specified node is found to need promotion.  At this point,
> all of the
> >  /// result types of the node are known to be legal, but other
> operands of the
> > @@ -752,7 +797,10 @@
> >    #endif
> >      llvm_unreachable("Do not know how to promote this operator's
> operand!");
> >
> > -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N); break;
> > +  case ISD::ANY_EXTEND:
> > +    Res = LowerExtendOp(N);
> > +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
> > +    break;
> >    case ISD::ATOMIC_STORE:
> >      Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
> >      break;
> > @@ -774,15 +822,20 @@
> >    case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo); break;
> >    case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N, OpNo);
> break;
> >    case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo); break;
> > -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N); break;
> > +  case ISD::SIGN_EXTEND:
> > +    Res = LowerExtendOp(N);
> > +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
> > +    break;
> >    case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N); break;
> >    case ISD::STORE:        Res =
> PromoteIntOp_STORE(cast<StoreSDNode>(N),
> >                                                     OpNo); break;
> >    case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N); break;
> >    case ISD::FP16_TO_FP32:
> >    case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N); break;
> > -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N); break;
> > -
> > +  case ISD::ZERO_EXTEND:
> > +    Res = LowerExtendOp(N);
> > +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
> > +    break;
> >    case ISD::SHL:
> >    case ISD::SRA:
> >    case ISD::SRL:
> > Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
> > ===================================================================
> > --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 164117)
> > +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
> > @@ -2565,7 +2565,8 @@
> >              VT.getVectorNumElements() ==
> >              Operand.getValueType().getVectorNumElements()) &&
> >             "Vector element count mismatch!");
> > -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> ISD::ZERO_EXTEND)
> > +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> ISD::ZERO_EXTEND) &&
> > +       TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >        return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >getOperand(0));
> >      else if (OpOpcode == ISD::UNDEF)
> >        // sext(undef) = 0, because the top bits will all be the same.
> > @@ -2581,7 +2582,9 @@
> >              VT.getVectorNumElements() ==
> >              Operand.getValueType().getVectorNumElements()) &&
> >             "Vector element count mismatch!");
> > -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) -> (zext
> x)
> > +    if (OpOpcode == ISD::ZERO_EXTEND &&
> > +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> > +      // (zext (zext x)) -> (zext x)
> >        return getNode(ISD::ZERO_EXTEND, DL, VT,
> >                       Operand.getNode()->getOperand(0));
> >      else if (OpOpcode == ISD::UNDEF)
> > @@ -2599,8 +2602,9 @@
> >              Operand.getValueType().getVectorNumElements()) &&
> >             "Vector element count mismatch!");
> >
> > -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode == ISD::SIGN_EXTEND
> ||
> > -        OpOpcode == ISD::ANY_EXTEND)
> > +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> ISD::SIGN_EXTEND ||
> > +        OpOpcode == ISD::ANY_EXTEND) &&
> > +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >        // (ext (zext x)) -> (zext x)  and  (ext (sext x)) -> (sext x)
> >        return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >getOperand(0));
> >      else if (OpOpcode == ISD::UNDEF)
> 
> On Oct 1, 2012, at 8:09 AM, Silviu Baranga <silbar01 at arm.com> wrote:
> 
> > Ping.
> >
> >> -----Original Message-----
> >> From: Silviu Baranga [mailto:silbar01 at arm.com]
> >> Sent: 20 September 2012 15:48
> >> To: 'Jim Grosbach'
> >> Cc: llvm-commits at cs.uiuc.edu
> >> Subject: RE: [llvm-commits] [patch] Enable the generation of vaddl-
> type
> >> instructions from extloads on ARM
> >>
> >> Hi Jim,
> >>
> >> Thanks for the comments! I am attaching a new patch with the
> >> changes applied.
> >>
> >> To make the code target-independent I've added the check that
> >> the 'middleType' is legal, which is equivalent to the previous
> >> ARM specific check. I didn't check if the operation is legal,
> >> since if the code got there it is marked as 'Expand' anyway.
> >>
> >> Cheers,
> >> Silviu
> >>
> >>
> >>> -----Original Message-----
> >>> From: Jim Grosbach [mailto:grosbach at apple.com]
> >>> Sent: 19 September 2012 17:56
> >>> To: Silviu Baranga
> >>> Cc: llvm-commits at cs.uiuc.edu
> >>> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-
> >> type
> >>> instructions from extloads on ARM
> >>>
> >>> Hi Silviu,
> >>>
> >>> Making good progress! A few comments on the updated patch inline.
> >>>
> >>>> Index: test/CodeGen/ARM/vector-extend-narrow.ll
> >>>>
> ===================================================================
> >>>> --- test/CodeGen/ARM/vector-extend-narrow.ll	(revision 164117)
> >>>> +++ test/CodeGen/ARM/vector-extend-narrow.ll	(working copy)
> >>>> @@ -73,3 +73,68 @@
> >>>>   ret <4 x i32> %2
> >>>> }
> >>>>
> >>>> +; CHECK: k:
> >>>> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.u8
> >>>> +  ; CHECK: vaddw.u16
> >>>> +  %1 = load <4 x i8>* %in, align 4
> >>>> +  %2 = zext <4 x i8> %1 to <4 x i32>
> >>>> +  %3 = add <4 x i32> %2, %x
> >>>> +  ret <4 x i32> %3
> >>>> +}
> >>>> +; CHECK: l:
> >>>> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.16
> >>>> +  ; CHECK: vmovl.u8
> >>>> +  ; CHECK: vmovl.u16
> >>>> +  ; CHECK: vaddw.u32
> >>>> +  %1 = load <2 x i8>* %in, align 4
> >>>> +  %2 = zext <2 x i8> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> +
> >>>> +; CHECK: m:
> >>>> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.u16
> >>>> +  ; CHECK: vaddw.u32
> >>>> +  %1 = load <2 x i16>* %in, align 4
> >>>> +  %2 = zext <2 x i16> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> +
> >>>> +; CHECK: n:
> >>>> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.s8
> >>>> +  ; CHECK: vaddw.s16
> >>>> +  %1 = load <4 x i8>* %in, align 4
> >>>> +  %2 = sext <4 x i8> %1 to <4 x i32>
> >>>> +  %3 = add <4 x i32> %2, %x
> >>>> +  ret <4 x i32> %3
> >>>> +}
> >>>> +; CHECK: o:
> >>>> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.16
> >>>> +  ; CHECK: vmovl.s8
> >>>> +  ; CHECK: vmovl.s16
> >>>> +  ; CHECK: vaddw.s32
> >>>> +  %1 = load <2 x i8>* %in, align 4
> >>>> +  %2 = sext <2 x i8> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> +
> >>>> +; CHECK: p:
> >>>> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.s16
> >>>> +  ; CHECK: vaddw.s32
> >>>> +  %1 = load <2 x i16>* %in, align 4
> >>>> +  %2 = sext <2 x i16> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> Index: lib/Target/ARM/ARMISelLowering.cpp
> >>>>
> ===================================================================
> >>>> --- lib/Target/ARM/ARMISelLowering.cpp	(revision 164117)
> >>>> +++ lib/Target/ARM/ARMISelLowering.cpp	(working copy)
> >>>> @@ -565,6 +570,12 @@
> >>>>       setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
> >>>>       setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
> >>>>     }
> >>>> +
> >>>> +    for (unsigned i = 0; i < 6; ++i) {
> >>>> +      setOperationAction(ISD::ZERO_EXTEND, Tys[i], Expand);
> >>>> +      setOperationAction(ISD::SIGN_EXTEND, Tys[i], Expand);
> >>>> +      setOperationAction(ISD::ANY_EXTEND, Tys[i], Expand);
> >>>> +    }
> >>>>   }
> >>>
> >>> I'd have sworn I commented about this bit before, but I can't find
> it
> >>> in my email logs.
> >>>
> >>> Anyways, This, and the loop above, should be moved out to a helper
> >>> function and the loop removed. That is, the helper should perform
> the
> >>> setOperationAction() calls for a single type, then the top level
> >>> function just calls that helper 6 times, once for each MVT. The
> NEON
> >>> bits for setting legal types do something similar.
> >>>
> >>>
> >>>>
> >>>>   // ARM and Thumb2 support UMLAL/SMLAL.
> >>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
> >>>> @@ -4342,7 +4342,7 @@
> >>>>   // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
> >>>>   // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
> >>>>   if ((ISD::isSEXTLoad(N0.getNode()) ||
> >>> ISD::isEXTLoad(N0.getNode())) &&
> >>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> >>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> >>> !VT.isVector()) {
> >>>>     LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >>>>     EVT MemVT = LN0->getMemoryVT();
> >>>>     if ((!LegalOperations && !LN0->isVolatile()) ||
> >>>> @@ -4678,7 +4678,7 @@
> >>>>   // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
> >>>>   // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
> >>>>   if ((ISD::isZEXTLoad(N0.getNode()) ||
> >>> ISD::isEXTLoad(N0.getNode())) &&
> >>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> >>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> >>> !VT.isVector()) {
> >>>>     LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >>>>     EVT MemVT = LN0->getMemoryVT();
> >>>>     if ((!LegalOperations && !LN0->isVolatile()) ||
> >>>> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h	(revision 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h	(working copy)
> >>>> @@ -287,6 +287,8 @@
> >>>>   SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
> >>>>   SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
> >>>>
> >>>> +  SDValue ExpandExtend(SDNode *N);
> >>>> +
> >>>>   void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
> >> ISD::CondCode
> >>> Code);
> >>>>
> >>>>   //===-----------------------------------------------------------
> >> --
> >>> -------===//
> >>>> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(revision
> >>> 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(working
> >>> copy)
> >>>> @@ -732,7 +732,44 @@
> >>>> //===-------------------------------------------------------------
> >> --
> >>> -------===//
> >>>> //  Integer Operand Promotion
> >>>> //===-------------------------------------------------------------
> >> --
> >>> -------===//
> >>>> +SDValue DAGTypeLegalizer::ExpandExtend(SDNode *N) {
> >>>
> >>> This function really needs a comment explaining its purpose and
> >> perhaps
> >>> a different name. It's not really doing a full expand of the
> extend,
> >>> but rather doing some housekeeping beforehand to make sure the
> >>> expansion that results from the Promote*() call will be reasonable
> >> for
> >>> the target. In any case, some comments about the motivation and
> logic
> >>> would be very helpful.
> >>>
> >>>> +  EVT VT0 = N->getValueType(0);
> >>>> +  EVT VT1 = N->getOperand(0).getValueType();
> >>>> +  DebugLoc dl = N->getDebugLoc();
> >>>> +  // We only split the node if the operation is not legal,
> >> otherwise
> >>> we could get
> >>>> +  // a legalization loop.
> >>>> +  if (TLI.getOperationAction(N->getOpcode(), VT1) ==
> >>> TargetLowering::Legal)
> >>>> +    return SDValue();
> >>>>
> >>>> +  if (!VT0.isVector())
> >>>> +    return SDValue();
> >>>> +
> >>>> +  // We don't want expand if the intermediate type is the v2i16,
> >>>> +  // since v2i16 is not a legal type. This only happens when the
> >>>> +  // destination has the v2i32 type and the source is v2i8.
> >>>> +  if (VT1 == MVT::v2i8 && VT0 == MVT::v2i32)
> >>>> +    return SDValue();
> >>>
> >>> This is in generic code now, so we can't make assumptions about
> what
> >>> types are legal. We need to ask the target. This function needs to
> be
> >>> made more generic, basically. For example, here we assume that
> v2i16
> >>> isn't legal. That's true for ARM, but quite possibly untrue for
> other
> >>> targets that support 32-bit vector types. Likewise, we may need to
> >>> check for which operations are legal as well as which types.
> >>>
> >>>> +  // If there is already an instruction that could be directly
> >>>> +  // selected for this there is no reason to expand since it will
> >>>> +  // generate a redundant expand instruction (from type X to type
> >>> X).
> >>>> +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> >>>> +    return SDValue();
> >>>> +
> >>>> +  LLVMContext *Context = DAG.getContext();
> >>>> +
> >>>> +  // We need to get the intermediate type. This type will be
> twice
> >>> as
> >>>> +  // small as the destination type, since all expand instructions
> >>> are
> >>>> +  // from a length X to a length 2*X and we have expand
> >> instructions
> >>>> +  // for all destination types.
> >>>> +  EVT middleType = EVT::getVectorVT(*Context,
> >>>> +                          EVT::getIntegerVT(*Context,
> >>>> +
> >> VT0.getScalarType().getSizeInBits()
> >>> / 2),
> >>>> +                          VT0.getVectorNumElements());
> >>>> +  return DAG.getNode(N->getOpcode(), dl, VT0,
> >>>> +                     DAG.getNode(N->getOpcode(), dl,
> >>>> +                                 middleType, N->getOperand(0)));
> >>>> +}
> >>>> +
> >>>> /// PromoteIntegerOperand - This method is called when the
> >> specified
> >>> operand of
> >>>> /// the specified node is found to need promotion.  At this point,
> >>> all of the
> >>>> /// result types of the node are known to be legal, but other
> >>> operands of the
> >>>> @@ -752,7 +789,10 @@
> >>>>   #endif
> >>>>     llvm_unreachable("Do not know how to promote this operator's
> >>> operand!");
> >>>>
> >>>> -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N);
> break;
> >>>> +  case ISD::ANY_EXTEND:
> >>>> +    Res = ExpandExtend(N);
> >>>> +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
> >>>> +    break;
> >>>>   case ISD::ATOMIC_STORE:
> >>>>     Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
> >>>>     break;
> >>>> @@ -774,15 +814,20 @@
> >>>>   case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo);
> >> break;
> >>>>   case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N, OpNo);
> >>> break;
> >>>>   case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo);
> >> break;
> >>>> -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N);
> >> break;
> >>>> +  case ISD::SIGN_EXTEND:
> >>>> +    Res = ExpandExtend(N);
> >>>> +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
> >>>> +    break;
> >>>>   case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N); break;
> >>>>   case ISD::STORE:        Res =
> >>> PromoteIntOp_STORE(cast<StoreSDNode>(N),
> >>>>                                                    OpNo); break;
> >>>>   case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N); break;
> >>>>   case ISD::FP16_TO_FP32:
> >>>>   case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N); break;
> >>>> -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N);
> >> break;
> >>>> -
> >>>> +  case ISD::ZERO_EXTEND:
> >>>> +    Res = ExpandExtend(N);
> >>>> +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
> >>>> +    break;
> >>>>   case ISD::SHL:
> >>>>   case ISD::SRA:
> >>>>   case ISD::SRL:
> >>>> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
> >>>> @@ -2565,7 +2565,8 @@
> >>>>             VT.getVectorNumElements() ==
> >>>>             Operand.getValueType().getVectorNumElements()) &&
> >>>>            "Vector element count mismatch!");
> >>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> >>> ISD::ZERO_EXTEND)
> >>>> +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> >>> ISD::ZERO_EXTEND) &&
> >>>> +       TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >>>>       return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >>>> getOperand(0));
> >>>>     else if (OpOpcode == ISD::UNDEF)
> >>>>       // sext(undef) = 0, because the top bits will all be the
> >> same.
> >>>> @@ -2581,7 +2582,9 @@
> >>>>             VT.getVectorNumElements() ==
> >>>>             Operand.getValueType().getVectorNumElements()) &&
> >>>>            "Vector element count mismatch!");
> >>>> -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) ->
> >> (zext
> >>> x)
> >>>> +    if (OpOpcode == ISD::ZERO_EXTEND &&
> >>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >>>> +      // (zext (zext x)) -> (zext x)
> >>>>       return getNode(ISD::ZERO_EXTEND, DL, VT,
> >>>>                      Operand.getNode()->getOperand(0));
> >>>>     else if (OpOpcode == ISD::UNDEF)
> >>>> @@ -2599,8 +2602,9 @@
> >>>>             Operand.getValueType().getVectorNumElements()) &&
> >>>>            "Vector element count mismatch!");
> >>>>
> >>>> -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> >> ISD::SIGN_EXTEND
> >>> ||
> >>>> -        OpOpcode == ISD::ANY_EXTEND)
> >>>> +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> >>> ISD::SIGN_EXTEND ||
> >>>> +        OpOpcode == ISD::ANY_EXTEND) &&
> >>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >>>>       // (ext (zext x)) -> (zext x)  and  (ext (sext x)) -> (sext
> >> x)
> >>>>       return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >>>> getOperand(0));
> >>>>     else if (OpOpcode == ISD::UNDEF)
> >>>
> >>>
> >>>
> >>>
> >>> On Sep 19, 2012, at 3:47 AM, Silviu Baranga <silbar01 at arm.com>
> wrote:
> >>>
> >>>> Hi Jim,
> >>>>
> >>>> Thanks for the review. It does make sense to do this in a target
> >>>> independent way. I've attached a new patch that does this.
> >>>>
> >>>> I've moved the code ARM-specific code into the legalizer and also
> >>>> added tests for sext nodes. It appears that I was mistaken before
> >>>> and (sext (sextload)) patterns were not prevented from folding for
> >>>> vector types. This was stopping the generation of vaddl-type
> >>>> instructions. I think that stopping the folding of (sext
> >> (sextload))
> >>>> patterns is still OK, because the merging would result in a
> illegal
> >>>> operation.
> >>>>
> >>>> Cheers,
> >>>> Silviu
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
> >>>>> Sent: 17 September 2012 18:09
> >>>>> To: Silviu Baranga
> >>>>> Cc: llvm-commits at cs.uiuc.edu
> >>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
> >> vaddl-
> >>> type
> >>>>> instructions from extloads on ARM
> >>>>>
> >>>>> Hi Silviu,
> >>>>>
> >>>>> Thanks for the update. This is definitely much better. Why do we
> >>> need
> >>>>> any ARM specific code here, though? Given the problem
> description,
> >>> it
> >>>>> seems like this is something the legalizer should be able to
> >> reason
> >>>>> about without needing custom handling in the target.
> >>>>>
> >>>>> -Jim
> >>>>>
> >>>>>
> >>>>> On Sep 6, 2012, at 9:34 AM, Silviu Baranga <silbar01 at arm.com>
> >> wrote:
> >>>>>
> >>>>>> Hi Jim,
> >>>>>>
> >>>>>> Thanks for the review. The BREAK node solution was a bit ugly.
> >>>>>>
> >>>>>> I've attached a new patch that solves this without a BREAK node.
> >>>>>> This is done by stopping the folding of nodes zext nodes when
> >>>>>> this will result in a non-legal operation.
> >>>>>>
> >>>>>> I also had to stop the (zext (zextload x)) folding for vector
> >>>>>> types. This is already being done for sext nodes.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Silviu
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
> >>>>>>> Sent: 06 September 2012 01:42
> >>>>>>> To: Silviu Baranga
> >>>>>>> Cc: llvm-commits at cs.uiuc.edu
> >>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
> >>> vaddl-
> >>>>> type
> >>>>>>> instructions from extloads on ARM
> >>>>>>>
> >>>>>>> Hi Silviu,
> >>>>>>>
> >>>>>>> I like the idea of teaching the compiler to use these
> >>> instructions,
> >>>>> but
> >>>>>>> the BREAK pseudo-instruction feels pretty gross to me. I'd much
> >>>>> prefer
> >>>>>>> we figure out how to teach the combiner and isel to be smarter
> >>>>> directly
> >>>>>>> rather than doing something like that.
> >>>>>>>
> >>>>>>> -Jim
> >>>>>>>
> >>>>>>> On Sep 4, 2012, at 1:58 AM, Silviu Baranga <silbar01 at arm.com>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Currently LLVM will not generate a vaddl-type instructions
> when
> >>> an
> >>>>>>> extload
> >>>>>>>> node is in the input dag. All vector extload nodes are
> >>> implemented
> >>>>>>> with
> >>>>>>>> a load instruction and a lengthening instruction on ARM. If
> the
> >>>>>>> lengthening
> >>>>>>>> instruction is followed by an add instruction, it would be
> >> better
> >>>>> to
> >>>>>>>> generate
> >>>>>>>> a vaddl-type instruction instead.
> >>>>>>>>
> >>>>>>>> To do this, we split all lengthening dag nodes which extend a
> >>> type
> >>>>> to
> >>>>>>> more
> >>>>>>>> then twice its size (this is OK because these nodes would be
> >>>>>>> implemented by
> >>>>>>>> multiple instructions anyway) into two lengthening nodes. One
> >>> node
> >>>>>>> will be
> >>>>>>>> later
> >>>>>>>> combined to form the extload node, and the other can be
> >> selected
> >>> as
> >>>>>>> part of
> >>>>>>>> a
> >>>>>>>> vaddl-type instruction (there are already patterns for this).
> >>>>>>>>
> >>>>>>>> To stop the dag nodes from re-combining, we insert a BREAK
> node
> >>>>>>> between
> >>>>>>>> them. The
> >>>>>>>> BREAK node is used as an optimization barrier for the DAG
> >>> combiner,
> >>>>>>> and
> >>>>>>>> might be
> >>>>>>>> useful in other cases as well. We remove the BREAK node in the
> >>>>>>> instruction
> >>>>>>>> selection stage.
> >>>>>>>>
> >>>>>>>> This approach allows the reuse of existing selection patterns.
> >>>>>>>>
> >>>>>>>> This patch depends on the following other patch:
> >>>>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-
> >>>>>>> 20120827/149201.
> >>>>>>>> html
> >>>>>>>>
> >>>>>>>> Please review!
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >> Silviu<patch.diff>_______________________________________________
> >>>>>>>> llvm-commits mailing list
> >>>>>>>> llvm-commits at cs.uiuc.edu
> >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>>>
> >>>>>>>
> >>>>>> <zext_split.diff>
> >>>>>
> >>>>>
> >>>> <zext_split.diff>
> >>>
> >>>
> >
> >
> >
> >
> 
>