[llvm-commits] [patch] Enable the generation of vaddl-type instructions from extloads on ARM

Mon Nov 26 08:37:56 PST 2012

Hi Jim,

I finally got around to re-writing this patch. Sorry for taking
so long.

I've refactored this patch to address your previous comments.
The code now uses "isOperationLegalOrCustom" to decide if it
can introduce the new zext/sext node for the intermediary type.

On the ARM side I've set the zext action for 64 bit vectors to legal,
and everything else to expand. All extload-type vector actions
are also set to expand now, since there are no instructions that
implement this.

As for the "LowerExtendOp" name, I'm open to suggestions.

This also fixes PR11570.

Cheers,
Silviu

> -----Original Message-----
> From: Jim Grosbach [mailto:grosbach at apple.com]
> Sent: 02 October 2012 21:54
> To: Silviu Baranga
> Cc: llvm-commits at cs.uiuc.edu for LLVM; Evan Cheng
> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-type
> instructions from extloads on ARM
> 
> Hi Silviu,
> 
> Specific comments below.
> 
> Evan, you're more expert than I in this part of the code. There's
> something about this whole thing that feels a bit funny, but I can't
> put my finger on it effectively enough to be very helpful. Mind having
> a look? It'll probably be obvious one way or the other to you.
> 
> > Index: test/CodeGen/ARM/vector-extend-narrow.ll
> 
> General comment: thank you for making sure to include test cases. It's
> amazing how many patches don't do that...
> 
> > ===================================================================
> > --- test/CodeGen/ARM/vector-extend-narrow.ll	(revision 164117)
> > +++ test/CodeGen/ARM/vector-extend-narrow.ll	(working copy)
> > @@ -73,3 +73,68 @@
> >    ret <4 x i32> %2
> >  }
> >
> > +; CHECK: k:
> > +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
> > +  ; CHECK: vld1.32
> > +  ; CHECK: vmovl.u8
> > +  ; CHECK: vaddw.u16
> > +  %1 = load <4 x i8>* %in, align 4
> > +  %2 = zext <4 x i8> %1 to <4 x i32>
> > +  %3 = add <4 x i32> %2, %x
> > +  ret <4 x i32> %3
> > +}
> > +; CHECK: l:
> > +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
> > +  ; CHECK: vld1.16
> > +  ; CHECK: vmovl.u8
> > +  ; CHECK: vmovl.u16
> > +  ; CHECK: vaddw.u32
> > +  %1 = load <2 x i8>* %in, align 4
> > +  %2 = zext <2 x i8> %1 to <2 x i64>
> > +  %3 = add <2 x i64> %2, %x
> > +  ret <2 x i64> %3
> > +}
> > +
> > +; CHECK: m:
> > +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
> > +  ; CHECK: vld1.32
> > +  ; CHECK: vmovl.u16
> > +  ; CHECK: vaddw.u32
> > +  %1 = load <2 x i16>* %in, align 4
> > +  %2 = zext <2 x i16> %1 to <2 x i64>
> > +  %3 = add <2 x i64> %2, %x
> > +  ret <2 x i64> %3
> > +}
> > +
> > +; CHECK: n:
> > +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
> > +  ; CHECK: vld1.32
> > +  ; CHECK: vmovl.s8
> > +  ; CHECK: vaddw.s16
> > +  %1 = load <4 x i8>* %in, align 4
> > +  %2 = sext <4 x i8> %1 to <4 x i32>
> > +  %3 = add <4 x i32> %2, %x
> > +  ret <4 x i32> %3
> > +}
> > +; CHECK: o:
> > +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
> > +  ; CHECK: vld1.16
> > +  ; CHECK: vmovl.s8
> > +  ; CHECK: vmovl.s16
> > +  ; CHECK: vaddw.s32
> > +  %1 = load <2 x i8>* %in, align 4
> > +  %2 = sext <2 x i8> %1 to <2 x i64>
> > +  %3 = add <2 x i64> %2, %x
> > +  ret <2 x i64> %3
> > +}
> > +
> > +; CHECK: p:
> > +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
> > +  ; CHECK: vld1.32
> > +  ; CHECK: vmovl.s16
> > +  ; CHECK: vaddw.s32
> > +  %1 = load <2 x i16>* %in, align 4
> > +  %2 = sext <2 x i16> %1 to <2 x i64>
> > +  %3 = add <2 x i64> %2, %x
> > +  ret <2 x i64> %3
> > +}
> > Index: lib/Target/ARM/ARMISelLowering.h
> > ===================================================================
> > --- lib/Target/ARM/ARMISelLowering.h	(revision 164117)
> > +++ lib/Target/ARM/ARMISelLowering.h	(working copy)
> > @@ -403,6 +406,8 @@
> >      void addDRTypeForNEON(MVT VT);
> >      void addQRTypeForNEON(MVT VT);
> >
> > +    void addExtendLegalizeTypeForNEON(MVT VT);
> > +
> >      typedef SmallVector<std::pair<unsigned, SDValue>, 8>
> RegsToPassVector;
> >      void PassF64ArgInRegs(DebugLoc dl, SelectionDAG &DAG,
> >                            SDValue Chain, SDValue &Arg,
> > Index: lib/Target/ARM/ARMISelLowering.cpp
> > ===================================================================
> > --- lib/Target/ARM/ARMISelLowering.cpp	(revision 164117)
> > +++ lib/Target/ARM/ARMISelLowering.cpp	(working copy)
> > @@ -158,6 +158,15 @@
> >    addTypeForNEON(VT, MVT::v2f64, MVT::v4i32);
> >  }
> >
> > +void ARMTargetLowering::addExtendLegalizeTypeForNEON(MVT VT) {
> > +      setLoadExtAction(ISD::EXTLOAD, VT, Legal);
> > +      setLoadExtAction(ISD::SEXTLOAD, VT, Legal);
> > +      setLoadExtAction(ISD::ZEXTLOAD, VT, Legal);
> > +      setOperationAction(ISD::ANY_EXTEND, VT, Expand);
> > +      setOperationAction(ISD::SIGN_EXTEND, VT, Expand);
> > +      setOperationAction(ISD::ZERO_EXTEND, VT, Expand);
> > +}
> 
> Formatting. Only two space indentation required.
> 
> > +
> >  static TargetLoweringObjectFile *createTLOF(TargetMachine &TM) {
> >    if (TM.getSubtarget<ARMSubtarget>().isTargetDarwin())
> >      return new TargetLoweringObjectFileMachO();
> > @@ -556,15 +565,17 @@
> >      setTargetDAGCombine(ISD::FP_TO_UINT);
> >      setTargetDAGCombine(ISD::FDIV);
> >
> > -    // It is legal to extload from v4i8 to v4i16 or v4i32.
> > -    MVT Tys[6] = {MVT::v8i8, MVT::v4i8, MVT::v2i8,
> > -                  MVT::v4i16, MVT::v2i16,
> > -                  MVT::v2i32};
> > -    for (unsigned i = 0; i < 6; ++i) {
> > -      setLoadExtAction(ISD::EXTLOAD, Tys[i], Legal);
> > -      setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
> > -      setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
> > -    }
> > +    // There is no instruction to expand v4i8 to v4i32 and
> > +    // we must do this with two vmovl instructions. However,
> > +    // the vmovl instructions can be combined with other
> > +    // lengthening instructions, so we would prefer to lower
> > +    // (zextload x) -> (zext (zextload x).
> > +    addExtendLegalizeTypeForNEON(MVT::v8i8);
> > +    addExtendLegalizeTypeForNEON(MVT::v4i8);
> > +    addExtendLegalizeTypeForNEON(MVT::v2i8);
> > +    addExtendLegalizeTypeForNEON(MVT::v4i16);
> > +    addExtendLegalizeTypeForNEON(MVT::v2i16);
> > +    addExtendLegalizeTypeForNEON(MVT::v2i32);
> >    }
> >
> >    // ARM and Thumb2 support UMLAL/SMLAL.
> > Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> > ===================================================================
> > --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 164117)
> > +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
> > @@ -4342,7 +4342,7 @@
> >    // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
> >    // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
> >    if ((ISD::isSEXTLoad(N0.getNode()) ||
> ISD::isEXTLoad(N0.getNode())) &&
> > -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> > +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> !VT.isVector()) {
> >      LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >      EVT MemVT = LN0->getMemoryVT();
> >      if ((!LegalOperations && !LN0->isVolatile()) ||
> > @@ -4678,7 +4678,7 @@
> >    // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
> >    // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
> >    if ((ISD::isZEXTLoad(N0.getNode()) ||
> ISD::isEXTLoad(N0.getNode())) &&
> > -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> > +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> !VT.isVector()) {
> >      LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >      EVT MemVT = LN0->getMemoryVT();
> >      if ((!LegalOperations && !LN0->isVolatile()) ||
> > Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
> > ===================================================================
> > --- lib/CodeGen/SelectionDAG/LegalizeTypes.h	(revision 164117)
> > +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h	(working copy)
> > @@ -287,6 +287,8 @@
> >    SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
> >    SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
> >
> > +  SDValue LowerExtendOp(SDNode *N);
> > +
> >    void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode
> Code);
> >
> >    //===-------------------------------------------------------------
> -------===//
> > Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> > ===================================================================
> > --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(revision
> 164117)
> > +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(working
> copy)
> > @@ -733,6 +733,51 @@
> >  //  Integer Operand Promotion
> >  //===---------------------------------------------------------------
> -------===//
> >
> > +// LowerExtendOp - This function breaks up vector zext/sext nodes so
> that the
> > +// result will more closely resemble the operations available on the
> target.
> > +// The resulting nodes can be each combined with other nodes to
> create legal
> > +// operations.
> 
> Some addition questions below, but in general, this could probably use
> a bit more description in the comment about what's actually happening
> here. Why this transform is a good idea and exactly what the transform
> is, in particular.
> 
> > +SDValue DAGTypeLegalizer::LowerExtendOp(SDNode *N) {
> 
> Is "Lower" really the right name for this function?
> 
> > +  EVT VT0 = N->getValueType(0);
> > +  EVT VT1 = N->getOperand(0).getValueType();
> > +  DebugLoc dl = N->getDebugLoc();
> > +  // We only split the node if the operation is not legal, otherwise
> we could get
> > +  // a legalization loop.
> > +  if (TLI.getOperationAction(N->getOpcode(), VT1) ==
> TargetLowering::Legal)
> 
> You probably just want TLI.isOperationLegalOrCustom() here.
> 
> > +    return SDValue();
> > +
> > +  if (!VT0.isVector())
> > +    return SDValue();
> 
> Minor note, but I'd put this check first as it's cheaper and more
> likely to fire than the check for the operation being legal.
> 
> > +
> > +  // If there is already an instruction that could be directly
> > +  // selected for this there is no reason to expand since it will
> > +  // generate a redundant expand instruction (from type X to type
> X).
> > +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> > +    return SDValue();
> 
> I don't follow this. It seems pretty arbitrary. The comment doesn't
> help much, since how do we know what instructions could be selected or
> not? Can you clarify?
> 
> > +
> > +  LLVMContext *Context = DAG.getContext();
> 
> Can just have this be a reference since that's how it's used below and
> we never can have a NULL context here.
> 
> > +
> > +  // We need to get the intermediate type. This type will be twice
> as
> > +  // small as the destination type,
> 
> s/twice as small as/half the size of/
> 
> > since all expand instructions are
> > +  // from a length X to a length 2*X and we have expand instructions
> > +  // for all destination types.
> 
> Not sure I understand what you mean by "we have expand instructions for
> all destination types." Can you clarify?
> 
> > +  EVT middleType = EVT::getVectorVT(*Context,
> 
> Variable names should start with a Capitol letter. It's just function
> names that are camel-case starting w/ lower case. That said, it's
> unclear what "middle" means here. Perhaps a clearer name or a comment?
> 
> > +                          EVT::getIntegerVT(*Context,
> > +                                VT0.getScalarType().getSizeInBits()
> / 2),
> > +                          VT0.getVectorNumElements());
> > +
> > +  // We don't want to expand if we can't use a legal intermediate
> type
> > +  // since there wouldn't be any benefit from this and we might end
> > +  // up with nodes that cannot be selected in the instruction
> selection
> > +  // stage.
> > +  if (!TLI.isTypeLegal(middleType))
> > +    return SDValue();
> > +
> > +  return DAG.getNode(N->getOpcode(), dl, VT0,
> > +                     DAG.getNode(N->getOpcode(), dl,
> > +                                 middleType, N->getOperand(0)));
> > +}
> > +
> >  /// PromoteIntegerOperand - This method is called when the specified
> operand of
> >  /// the specified node is found to need promotion.  At this point,
> all of the
> >  /// result types of the node are known to be legal, but other
> operands of the
> > @@ -752,7 +797,10 @@
> >    #endif
> >      llvm_unreachable("Do not know how to promote this operator's
> operand!");
> >
> > -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N); break;
> > +  case ISD::ANY_EXTEND:
> > +    Res = LowerExtendOp(N);
> > +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
> > +    break;
> >    case ISD::ATOMIC_STORE:
> >      Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
> >      break;
> > @@ -774,15 +822,20 @@
> >    case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo); break;
> >    case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N, OpNo);
> break;
> >    case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo); break;
> > -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N); break;
> > +  case ISD::SIGN_EXTEND:
> > +    Res = LowerExtendOp(N);
> > +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
> > +    break;
> >    case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N); break;
> >    case ISD::STORE:        Res =
> PromoteIntOp_STORE(cast<StoreSDNode>(N),
> >                                                     OpNo); break;
> >    case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N); break;
> >    case ISD::FP16_TO_FP32:
> >    case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N); break;
> > -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N); break;
> > -
> > +  case ISD::ZERO_EXTEND:
> > +    Res = LowerExtendOp(N);
> > +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
> > +    break;
> >    case ISD::SHL:
> >    case ISD::SRA:
> >    case ISD::SRL:
> > Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
> > ===================================================================
> > --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 164117)
> > +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
> > @@ -2565,7 +2565,8 @@
> >              VT.getVectorNumElements() ==
> >              Operand.getValueType().getVectorNumElements()) &&
> >             "Vector element count mismatch!");
> > -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> ISD::ZERO_EXTEND)
> > +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> ISD::ZERO_EXTEND) &&
> > +       TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >        return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >getOperand(0));
> >      else if (OpOpcode == ISD::UNDEF)
> >        // sext(undef) = 0, because the top bits will all be the same.
> > @@ -2581,7 +2582,9 @@
> >              VT.getVectorNumElements() ==
> >              Operand.getValueType().getVectorNumElements()) &&
> >             "Vector element count mismatch!");
> > -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) -> (zext
> x)
> > +    if (OpOpcode == ISD::ZERO_EXTEND &&
> > +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> > +      // (zext (zext x)) -> (zext x)
> >        return getNode(ISD::ZERO_EXTEND, DL, VT,
> >                       Operand.getNode()->getOperand(0));
> >      else if (OpOpcode == ISD::UNDEF)
> > @@ -2599,8 +2602,9 @@
> >              Operand.getValueType().getVectorNumElements()) &&
> >             "Vector element count mismatch!");
> >
> > -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode == ISD::SIGN_EXTEND
> ||
> > -        OpOpcode == ISD::ANY_EXTEND)
> > +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> ISD::SIGN_EXTEND ||
> > +        OpOpcode == ISD::ANY_EXTEND) &&
> > +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >        // (ext (zext x)) -> (zext x)  and  (ext (sext x)) -> (sext x)
> >        return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >getOperand(0));
> >      else if (OpOpcode == ISD::UNDEF)
> 
> On Oct 1, 2012, at 8:09 AM, Silviu Baranga <silbar01 at arm.com> wrote:
> 
> > Ping.
> >
> >> -----Original Message-----
> >> From: Silviu Baranga [mailto:silbar01 at arm.com]
> >> Sent: 20 September 2012 15:48
> >> To: 'Jim Grosbach'
> >> Cc: llvm-commits at cs.uiuc.edu
> >> Subject: RE: [llvm-commits] [patch] Enable the generation of vaddl-
> type
> >> instructions from extloads on ARM
> >>
> >> Hi Jim,
> >>
> >> Thanks for the comments! I am attaching a new patch with the
> >> changes applied.
> >>
> >> To make the code target-independent I've added the check that
> >> the 'middleType' is legal, which is equivalent to the previous
> >> ARM specific check. I didn't check if the operation is legal,
> >> since if the code got there it is marked as 'Expand' anyway.
> >>
> >> Cheers,
> >> Silviu
> >>
> >>
> >>> -----Original Message-----
> >>> From: Jim Grosbach [mailto:grosbach at apple.com]
> >>> Sent: 19 September 2012 17:56
> >>> To: Silviu Baranga
> >>> Cc: llvm-commits at cs.uiuc.edu
> >>> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-
> >> type
> >>> instructions from extloads on ARM
> >>>
> >>> Hi Silviu,
> >>>
> >>> Making good progress! A few comments on the updated patch inline.
> >>>
> >>>> Index: test/CodeGen/ARM/vector-extend-narrow.ll
> >>>>
> ===================================================================
> >>>> --- test/CodeGen/ARM/vector-extend-narrow.ll	(revision 164117)
> >>>> +++ test/CodeGen/ARM/vector-extend-narrow.ll	(working copy)
> >>>> @@ -73,3 +73,68 @@
> >>>>   ret <4 x i32> %2
> >>>> }
> >>>>
> >>>> +; CHECK: k:
> >>>> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.u8
> >>>> +  ; CHECK: vaddw.u16
> >>>> +  %1 = load <4 x i8>* %in, align 4
> >>>> +  %2 = zext <4 x i8> %1 to <4 x i32>
> >>>> +  %3 = add <4 x i32> %2, %x
> >>>> +  ret <4 x i32> %3
> >>>> +}
> >>>> +; CHECK: l:
> >>>> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.16
> >>>> +  ; CHECK: vmovl.u8
> >>>> +  ; CHECK: vmovl.u16
> >>>> +  ; CHECK: vaddw.u32
> >>>> +  %1 = load <2 x i8>* %in, align 4
> >>>> +  %2 = zext <2 x i8> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> +
> >>>> +; CHECK: m:
> >>>> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.u16
> >>>> +  ; CHECK: vaddw.u32
> >>>> +  %1 = load <2 x i16>* %in, align 4
> >>>> +  %2 = zext <2 x i16> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> +
> >>>> +; CHECK: n:
> >>>> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.s8
> >>>> +  ; CHECK: vaddw.s16
> >>>> +  %1 = load <4 x i8>* %in, align 4
> >>>> +  %2 = sext <4 x i8> %1 to <4 x i32>
> >>>> +  %3 = add <4 x i32> %2, %x
> >>>> +  ret <4 x i32> %3
> >>>> +}
> >>>> +; CHECK: o:
> >>>> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.16
> >>>> +  ; CHECK: vmovl.s8
> >>>> +  ; CHECK: vmovl.s16
> >>>> +  ; CHECK: vaddw.s32
> >>>> +  %1 = load <2 x i8>* %in, align 4
> >>>> +  %2 = sext <2 x i8> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> +
> >>>> +; CHECK: p:
> >>>> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.s16
> >>>> +  ; CHECK: vaddw.s32
> >>>> +  %1 = load <2 x i16>* %in, align 4
> >>>> +  %2 = sext <2 x i16> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> Index: lib/Target/ARM/ARMISelLowering.cpp
> >>>>
> ===================================================================
> >>>> --- lib/Target/ARM/ARMISelLowering.cpp	(revision 164117)
> >>>> +++ lib/Target/ARM/ARMISelLowering.cpp	(working copy)
> >>>> @@ -565,6 +570,12 @@
> >>>>       setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
> >>>>       setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
> >>>>     }
> >>>> +
> >>>> +    for (unsigned i = 0; i < 6; ++i) {
> >>>> +      setOperationAction(ISD::ZERO_EXTEND, Tys[i], Expand);
> >>>> +      setOperationAction(ISD::SIGN_EXTEND, Tys[i], Expand);
> >>>> +      setOperationAction(ISD::ANY_EXTEND, Tys[i], Expand);
> >>>> +    }
> >>>>   }
> >>>
> >>> I'd have sworn I commented about this bit before, but I can't find
> it
> >>> in my email logs.
> >>>
> >>> Anyways, This, and the loop above, should be moved out to a helper
> >>> function and the loop removed. That is, the helper should perform
> the
> >>> setOperationAction() calls for a single type, then the top level
> >>> function just calls that helper 6 times, once for each MVT. The
> NEON
> >>> bits for setting legal types do something similar.
> >>>
> >>>
> >>>>
> >>>>   // ARM and Thumb2 support UMLAL/SMLAL.
> >>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
> >>>> @@ -4342,7 +4342,7 @@
> >>>>   // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
> >>>>   // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
> >>>>   if ((ISD::isSEXTLoad(N0.getNode()) ||
> >>> ISD::isEXTLoad(N0.getNode())) &&
> >>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> >>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> >>> !VT.isVector()) {
> >>>>     LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >>>>     EVT MemVT = LN0->getMemoryVT();
> >>>>     if ((!LegalOperations && !LN0->isVolatile()) ||
> >>>> @@ -4678,7 +4678,7 @@
> >>>>   // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
> >>>>   // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
> >>>>   if ((ISD::isZEXTLoad(N0.getNode()) ||
> >>> ISD::isEXTLoad(N0.getNode())) &&
> >>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> >>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> >>> !VT.isVector()) {
> >>>>     LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >>>>     EVT MemVT = LN0->getMemoryVT();
> >>>>     if ((!LegalOperations && !LN0->isVolatile()) ||
> >>>> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h	(revision 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h	(working copy)
> >>>> @@ -287,6 +287,8 @@
> >>>>   SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
> >>>>   SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
> >>>>
> >>>> +  SDValue ExpandExtend(SDNode *N);
> >>>> +
> >>>>   void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
> >> ISD::CondCode
> >>> Code);
> >>>>
> >>>>   //===-----------------------------------------------------------
> >> --
> >>> -------===//
> >>>> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(revision
> >>> 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(working
> >>> copy)
> >>>> @@ -732,7 +732,44 @@
> >>>> //===-------------------------------------------------------------
> >> --
> >>> -------===//
> >>>> //  Integer Operand Promotion
> >>>> //===-------------------------------------------------------------
> >> --
> >>> -------===//
> >>>> +SDValue DAGTypeLegalizer::ExpandExtend(SDNode *N) {
> >>>
> >>> This function really needs a comment explaining its purpose and
> >> perhaps
> >>> a different name. It's not really doing a full expand of the
> extend,
> >>> but rather doing some housekeeping beforehand to make sure the
> >>> expansion that results from the Promote*() call will be reasonable
> >> for
> >>> the target. In any case, some comments about the motivation and
> logic
> >>> would be very helpful.
> >>>
> >>>> +  EVT VT0 = N->getValueType(0);
> >>>> +  EVT VT1 = N->getOperand(0).getValueType();
> >>>> +  DebugLoc dl = N->getDebugLoc();
> >>>> +  // We only split the node if the operation is not legal,
> >> otherwise
> >>> we could get
> >>>> +  // a legalization loop.
> >>>> +  if (TLI.getOperationAction(N->getOpcode(), VT1) ==
> >>> TargetLowering::Legal)
> >>>> +    return SDValue();
> >>>>
> >>>> +  if (!VT0.isVector())
> >>>> +    return SDValue();
> >>>> +
> >>>> +  // We don't want expand if the intermediate type is the v2i16,
> >>>> +  // since v2i16 is not a legal type. This only happens when the
> >>>> +  // destination has the v2i32 type and the source is v2i8.
> >>>> +  if (VT1 == MVT::v2i8 && VT0 == MVT::v2i32)
> >>>> +    return SDValue();
> >>>
> >>> This is in generic code now, so we can't make assumptions about
> what
> >>> types are legal. We need to ask the target. This function needs to
> be
> >>> made more generic, basically. For example, here we assume that
> v2i16
> >>> isn't legal. That's true for ARM, but quite possibly untrue for
> other
> >>> targets that support 32-bit vector types. Likewise, we may need to
> >>> check for which operations are legal as well as which types.
> >>>
> >>>> +  // If there is already an instruction that could be directly
> >>>> +  // selected for this there is no reason to expand since it will
> >>>> +  // generate a redundant expand instruction (from type X to type
> >>> X).
> >>>> +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> >>>> +    return SDValue();
> >>>> +
> >>>> +  LLVMContext *Context = DAG.getContext();
> >>>> +
> >>>> +  // We need to get the intermediate type. This type will be
> twice
> >>> as
> >>>> +  // small as the destination type, since all expand instructions
> >>> are
> >>>> +  // from a length X to a length 2*X and we have expand
> >> instructions
> >>>> +  // for all destination types.
> >>>> +  EVT middleType = EVT::getVectorVT(*Context,
> >>>> +                          EVT::getIntegerVT(*Context,
> >>>> +
> >> VT0.getScalarType().getSizeInBits()
> >>> / 2),
> >>>> +                          VT0.getVectorNumElements());
> >>>> +  return DAG.getNode(N->getOpcode(), dl, VT0,
> >>>> +                     DAG.getNode(N->getOpcode(), dl,
> >>>> +                                 middleType, N->getOperand(0)));
> >>>> +}
> >>>> +
> >>>> /// PromoteIntegerOperand - This method is called when the
> >> specified
> >>> operand of
> >>>> /// the specified node is found to need promotion.  At this point,
> >>> all of the
> >>>> /// result types of the node are known to be legal, but other
> >>> operands of the
> >>>> @@ -752,7 +789,10 @@
> >>>>   #endif
> >>>>     llvm_unreachable("Do not know how to promote this operator's
> >>> operand!");
> >>>>
> >>>> -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N);
> break;
> >>>> +  case ISD::ANY_EXTEND:
> >>>> +    Res = ExpandExtend(N);
> >>>> +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
> >>>> +    break;
> >>>>   case ISD::ATOMIC_STORE:
> >>>>     Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
> >>>>     break;
> >>>> @@ -774,15 +814,20 @@
> >>>>   case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo);
> >> break;
> >>>>   case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N, OpNo);
> >>> break;
> >>>>   case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo);
> >> break;
> >>>> -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N);
> >> break;
> >>>> +  case ISD::SIGN_EXTEND:
> >>>> +    Res = ExpandExtend(N);
> >>>> +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
> >>>> +    break;
> >>>>   case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N); break;
> >>>>   case ISD::STORE:        Res =
> >>> PromoteIntOp_STORE(cast<StoreSDNode>(N),
> >>>>                                                    OpNo); break;
> >>>>   case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N); break;
> >>>>   case ISD::FP16_TO_FP32:
> >>>>   case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N); break;
> >>>> -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N);
> >> break;
> >>>> -
> >>>> +  case ISD::ZERO_EXTEND:
> >>>> +    Res = ExpandExtend(N);
> >>>> +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
> >>>> +    break;
> >>>>   case ISD::SHL:
> >>>>   case ISD::SRA:
> >>>>   case ISD::SRL:
> >>>> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
> >>>> @@ -2565,7 +2565,8 @@
> >>>>             VT.getVectorNumElements() ==
> >>>>             Operand.getValueType().getVectorNumElements()) &&
> >>>>            "Vector element count mismatch!");
> >>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> >>> ISD::ZERO_EXTEND)
> >>>> +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> >>> ISD::ZERO_EXTEND) &&
> >>>> +       TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >>>>       return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >>>> getOperand(0));
> >>>>     else if (OpOpcode == ISD::UNDEF)
> >>>>       // sext(undef) = 0, because the top bits will all be the
> >> same.
> >>>> @@ -2581,7 +2582,9 @@
> >>>>             VT.getVectorNumElements() ==
> >>>>             Operand.getValueType().getVectorNumElements()) &&
> >>>>            "Vector element count mismatch!");
> >>>> -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) ->
> >> (zext
> >>> x)
> >>>> +    if (OpOpcode == ISD::ZERO_EXTEND &&
> >>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >>>> +      // (zext (zext x)) -> (zext x)
> >>>>       return getNode(ISD::ZERO_EXTEND, DL, VT,
> >>>>                      Operand.getNode()->getOperand(0));
> >>>>     else if (OpOpcode == ISD::UNDEF)
> >>>> @@ -2599,8 +2602,9 @@
> >>>>             Operand.getValueType().getVectorNumElements()) &&
> >>>>            "Vector element count mismatch!");
> >>>>
> >>>> -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> >> ISD::SIGN_EXTEND
> >>> ||
> >>>> -        OpOpcode == ISD::ANY_EXTEND)
> >>>> +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> >>> ISD::SIGN_EXTEND ||
> >>>> +        OpOpcode == ISD::ANY_EXTEND) &&
> >>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >>>>       // (ext (zext x)) -> (zext x)  and  (ext (sext x)) -> (sext
> >> x)
> >>>>       return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >>>> getOperand(0));
> >>>>     else if (OpOpcode == ISD::UNDEF)
> >>>
> >>>
> >>>
> >>>
> >>> On Sep 19, 2012, at 3:47 AM, Silviu Baranga <silbar01 at arm.com>
> wrote:
> >>>
> >>>> Hi Jim,
> >>>>
> >>>> Thanks for the review. It does make sense to do this in a target
> >>>> independent way. I've attached a new patch that does this.
> >>>>
> >>>> I've moved the code ARM-specific code into the legalizer and also
> >>>> added tests for sext nodes. It appears that I was mistaken before
> >>>> and (sext (sextload)) patterns were not prevented from folding for
> >>>> vector types. This was stopping the generation of vaddl-type
> >>>> instructions. I think that stopping the folding of (sext
> >> (sextload))
> >>>> patterns is still OK, because the merging would result in a
> illegal
> >>>> operation.
> >>>>
> >>>> Cheers,
> >>>> Silviu
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
> >>>>> Sent: 17 September 2012 18:09
> >>>>> To: Silviu Baranga
> >>>>> Cc: llvm-commits at cs.uiuc.edu
> >>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
> >> vaddl-
> >>> type
> >>>>> instructions from extloads on ARM
> >>>>>
> >>>>> Hi Silviu,
> >>>>>
> >>>>> Thanks for the update. This is definitely much better. Why do we
> >>> need
> >>>>> any ARM specific code here, though? Given the problem
> description,
> >>> it
> >>>>> seems like this is something the legalizer should be able to
> >> reason
> >>>>> about without needing custom handling in the target.
> >>>>>
> >>>>> -Jim
> >>>>>
> >>>>>
> >>>>> On Sep 6, 2012, at 9:34 AM, Silviu Baranga <silbar01 at arm.com>
> >> wrote:
> >>>>>
> >>>>>> Hi Jim,
> >>>>>>
> >>>>>> Thanks for the review. The BREAK node solution was a bit ugly.
> >>>>>>
> >>>>>> I've attached a new patch that solves this without a BREAK node.
> >>>>>> This is done by stopping the folding of nodes zext nodes when
> >>>>>> this will result in a non-legal operation.
> >>>>>>
> >>>>>> I also had to stop the (zext (zextload x)) folding for vector
> >>>>>> types. This is already being done for sext nodes.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Silviu
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
> >>>>>>> Sent: 06 September 2012 01:42
> >>>>>>> To: Silviu Baranga
> >>>>>>> Cc: llvm-commits at cs.uiuc.edu
> >>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
> >>> vaddl-
> >>>>> type
> >>>>>>> instructions from extloads on ARM
> >>>>>>>
> >>>>>>> Hi Silviu,
> >>>>>>>
> >>>>>>> I like the idea of teaching the compiler to use these
> >>> instructions,
> >>>>> but
> >>>>>>> the BREAK pseudo-instruction feels pretty gross to me. I'd much
> >>>>> prefer
> >>>>>>> we figure out how to teach the combiner and isel to be smarter
> >>>>> directly
> >>>>>>> rather than doing something like that.
> >>>>>>>
> >>>>>>> -Jim
> >>>>>>>
> >>>>>>> On Sep 4, 2012, at 1:58 AM, Silviu Baranga <silbar01 at arm.com>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Currently LLVM will not generate a vaddl-type instructions
> when
> >>> an
> >>>>>>> extload
> >>>>>>>> node is in the input dag. All vector extload nodes are
> >>> implemented
> >>>>>>> with
> >>>>>>>> a load instruction and a lengthening instruction on ARM. If
> the
> >>>>>>> lengthening
> >>>>>>>> instruction is followed by an add instruction, it would be
> >> better
> >>>>> to
> >>>>>>>> generate
> >>>>>>>> a vaddl-type instruction instead.
> >>>>>>>>
> >>>>>>>> To do this, we split all lengthening dag nodes which extend a
> >>> type
> >>>>> to
> >>>>>>> more
> >>>>>>>> then twice its size (this is OK because these nodes would be
> >>>>>>> implemented by
> >>>>>>>> multiple instructions anyway) into two lengthening nodes. One
> >>> node
> >>>>>>> will be
> >>>>>>>> later
> >>>>>>>> combined to form the extload node, and the other can be
> >> selected
> >>> as
> >>>>>>> part of
> >>>>>>>> a
> >>>>>>>> vaddl-type instruction (there are already patterns for this).
> >>>>>>>>
> >>>>>>>> To stop the dag nodes from re-combining, we insert a BREAK
> node
> >>>>>>> between
> >>>>>>>> them. The
> >>>>>>>> BREAK node is used as an optimization barrier for the DAG
> >>> combiner,
> >>>>>>> and
> >>>>>>>> might be
> >>>>>>>> useful in other cases as well. We remove the BREAK node in the
> >>>>>>> instruction
> >>>>>>>> selection stage.
> >>>>>>>>
> >>>>>>>> This approach allows the reuse of existing selection patterns.
> >>>>>>>>
> >>>>>>>> This patch depends on the following other patch:
> >>>>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-
> >>>>>>> 20120827/149201.
> >>>>>>>> html
> >>>>>>>>
> >>>>>>>> Please review!
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >> Silviu<patch.diff>_______________________________________________
> >>>>>>>> llvm-commits mailing list
> >>>>>>>> llvm-commits at cs.uiuc.edu
> >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>>>
> >>>>>>>
> >>>>>> <zext_split.diff>
> >>>>>
> >>>>>
> >>>> <zext_split.diff>
> >>>
> >>>
> >
> >
> >
> >
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zext.diff
Type: application/octet-stream
Size: 13045 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121126/f9836124/attachment.obj>