[llvm-commits] [patch] Enable the generation of vaddl-type instructions from extloads on ARM

Silviu Baranga Silviu.Baranga at arm.com
Fri Dec 7 08:41:40 PST 2012


Hi Evan,

Sure. The patch is attached to this email.

Thanks,
Silviu

> -----Original Message-----
> From: Evan Cheng [mailto:evan.cheng at apple.com]
> Sent: 07 December 2012 16:39
> To: Silviu Baranga
> Cc: Jim Grosbach; llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-type
> instructions from extloads on ARM
>
> Can you attach the patch?
>
> Thanks,
>
> Evan
>
> Sent from my iPad
>
> On Dec 7, 2012, at 4:33 AM, Silviu Baranga <silbar01 at arm.com> wrote:
>
> > Ping!
> >
> > --Silviu
> >
> >> -----Original Message-----
> >> From: Silviu Baranga [mailto:silbar01 at arm.com]
> >> Sent: 26 November 2012 16:38
> >> To: 'Jim Grosbach'
> >> Cc: llvm-commits at cs.uiuc.edu for LLVM; Evan Cheng
> >> Subject: RE: [llvm-commits] [patch] Enable the generation of vaddl-
> type
> >> instructions from extloads on ARM
> >>
> >> Hi Jim,
> >>
> >> I finally got around to re-writing this patch. Sorry for taking
> >> so long.
> >>
> >> I've refactored this patch to address your previous comments.
> >> The code now uses "isOperationLegalOrCustom" to decide if it
> >> can introduce the new zext/sext node for the intermediary type.
> >>
> >> On the ARM side I've set the zext action for 64 bit vectors to
> legal,
> >> and everything else to expand. All extload-type vector actions
> >> are also set to expand now, since there are no instructions that
> >> implement this.
> >>
> >> As for the "LowerExtendOp" name, I'm open to suggestions.
> >>
> >> This also fixes PR11570.
> >>
> >> Cheers,
> >> Silviu
> >>
> >>> -----Original Message-----
> >>> From: Jim Grosbach [mailto:grosbach at apple.com]
> >>> Sent: 02 October 2012 21:54
> >>> To: Silviu Baranga
> >>> Cc: llvm-commits at cs.uiuc.edu for LLVM; Evan Cheng
> >>> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-
> >> type
> >>> instructions from extloads on ARM
> >>>
> >>> Hi Silviu,
> >>>
> >>> Specific comments below.
> >>>
> >>> Evan, you're more expert than I in this part of the code. There's
> >>> something about this whole thing that feels a bit funny, but I
> can't
> >>> put my finger on it effectively enough to be very helpful. Mind
> >> having
> >>> a look? It'll probably be obvious one way or the other to you.
> >>>
> >>>> Index: test/CodeGen/ARM/vector-extend-narrow.ll
> >>>
> >>> General comment: thank you for making sure to include test cases.
> >> It's
> >>> amazing how many patches don't do that...
> >>>
> >>>>
> ===================================================================
> >>>> --- test/CodeGen/ARM/vector-extend-narrow.ll    (revision 164117)
> >>>> +++ test/CodeGen/ARM/vector-extend-narrow.ll    (working copy)
> >>>> @@ -73,3 +73,68 @@
> >>>>   ret <4 x i32> %2
> >>>> }
> >>>>
> >>>> +; CHECK: k:
> >>>> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.u8
> >>>> +  ; CHECK: vaddw.u16
> >>>> +  %1 = load <4 x i8>* %in, align 4
> >>>> +  %2 = zext <4 x i8> %1 to <4 x i32>
> >>>> +  %3 = add <4 x i32> %2, %x
> >>>> +  ret <4 x i32> %3
> >>>> +}
> >>>> +; CHECK: l:
> >>>> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.16
> >>>> +  ; CHECK: vmovl.u8
> >>>> +  ; CHECK: vmovl.u16
> >>>> +  ; CHECK: vaddw.u32
> >>>> +  %1 = load <2 x i8>* %in, align 4
> >>>> +  %2 = zext <2 x i8> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> +
> >>>> +; CHECK: m:
> >>>> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.u16
> >>>> +  ; CHECK: vaddw.u32
> >>>> +  %1 = load <2 x i16>* %in, align 4
> >>>> +  %2 = zext <2 x i16> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> +
> >>>> +; CHECK: n:
> >>>> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.s8
> >>>> +  ; CHECK: vaddw.s16
> >>>> +  %1 = load <4 x i8>* %in, align 4
> >>>> +  %2 = sext <4 x i8> %1 to <4 x i32>
> >>>> +  %3 = add <4 x i32> %2, %x
> >>>> +  ret <4 x i32> %3
> >>>> +}
> >>>> +; CHECK: o:
> >>>> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.16
> >>>> +  ; CHECK: vmovl.s8
> >>>> +  ; CHECK: vmovl.s16
> >>>> +  ; CHECK: vaddw.s32
> >>>> +  %1 = load <2 x i8>* %in, align 4
> >>>> +  %2 = sext <2 x i8> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> +
> >>>> +; CHECK: p:
> >>>> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
> >>>> +  ; CHECK: vld1.32
> >>>> +  ; CHECK: vmovl.s16
> >>>> +  ; CHECK: vaddw.s32
> >>>> +  %1 = load <2 x i16>* %in, align 4
> >>>> +  %2 = sext <2 x i16> %1 to <2 x i64>
> >>>> +  %3 = add <2 x i64> %2, %x
> >>>> +  ret <2 x i64> %3
> >>>> +}
> >>>> Index: lib/Target/ARM/ARMISelLowering.h
> >>>>
> ===================================================================
> >>>> --- lib/Target/ARM/ARMISelLowering.h    (revision 164117)
> >>>> +++ lib/Target/ARM/ARMISelLowering.h    (working copy)
> >>>> @@ -403,6 +406,8 @@
> >>>>     void addDRTypeForNEON(MVT VT);
> >>>>     void addQRTypeForNEON(MVT VT);
> >>>>
> >>>> +    void addExtendLegalizeTypeForNEON(MVT VT);
> >>>> +
> >>>>     typedef SmallVector<std::pair<unsigned, SDValue>, 8>
> >>> RegsToPassVector;
> >>>>     void PassF64ArgInRegs(DebugLoc dl, SelectionDAG &DAG,
> >>>>                           SDValue Chain, SDValue &Arg,
> >>>> Index: lib/Target/ARM/ARMISelLowering.cpp
> >>>>
> ===================================================================
> >>>> --- lib/Target/ARM/ARMISelLowering.cpp    (revision 164117)
> >>>> +++ lib/Target/ARM/ARMISelLowering.cpp    (working copy)
> >>>> @@ -158,6 +158,15 @@
> >>>>   addTypeForNEON(VT, MVT::v2f64, MVT::v4i32);
> >>>> }
> >>>>
> >>>> +void ARMTargetLowering::addExtendLegalizeTypeForNEON(MVT VT) {
> >>>> +      setLoadExtAction(ISD::EXTLOAD, VT, Legal);
> >>>> +      setLoadExtAction(ISD::SEXTLOAD, VT, Legal);
> >>>> +      setLoadExtAction(ISD::ZEXTLOAD, VT, Legal);
> >>>> +      setOperationAction(ISD::ANY_EXTEND, VT, Expand);
> >>>> +      setOperationAction(ISD::SIGN_EXTEND, VT, Expand);
> >>>> +      setOperationAction(ISD::ZERO_EXTEND, VT, Expand);
> >>>> +}
> >>>
> >>> Formatting. Only two space indentation required.
> >>>
> >>>> +
> >>>> static TargetLoweringObjectFile *createTLOF(TargetMachine &TM) {
> >>>>   if (TM.getSubtarget<ARMSubtarget>().isTargetDarwin())
> >>>>     return new TargetLoweringObjectFileMachO();
> >>>> @@ -556,15 +565,17 @@
> >>>>     setTargetDAGCombine(ISD::FP_TO_UINT);
> >>>>     setTargetDAGCombine(ISD::FDIV);
> >>>>
> >>>> -    // It is legal to extload from v4i8 to v4i16 or v4i32.
> >>>> -    MVT Tys[6] = {MVT::v8i8, MVT::v4i8, MVT::v2i8,
> >>>> -                  MVT::v4i16, MVT::v2i16,
> >>>> -                  MVT::v2i32};
> >>>> -    for (unsigned i = 0; i < 6; ++i) {
> >>>> -      setLoadExtAction(ISD::EXTLOAD, Tys[i], Legal);
> >>>> -      setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
> >>>> -      setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
> >>>> -    }
> >>>> +    // There is no instruction to expand v4i8 to v4i32 and
> >>>> +    // we must do this with two vmovl instructions. However,
> >>>> +    // the vmovl instructions can be combined with other
> >>>> +    // lengthening instructions, so we would prefer to lower
> >>>> +    // (zextload x) -> (zext (zextload x).
> >>>> +    addExtendLegalizeTypeForNEON(MVT::v8i8);
> >>>> +    addExtendLegalizeTypeForNEON(MVT::v4i8);
> >>>> +    addExtendLegalizeTypeForNEON(MVT::v2i8);
> >>>> +    addExtendLegalizeTypeForNEON(MVT::v4i16);
> >>>> +    addExtendLegalizeTypeForNEON(MVT::v2i16);
> >>>> +    addExtendLegalizeTypeForNEON(MVT::v2i32);
> >>>>   }
> >>>>
> >>>>   // ARM and Thumb2 support UMLAL/SMLAL.
> >>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (revision 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (working copy)
> >>>> @@ -4342,7 +4342,7 @@
> >>>>   // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
> >>>>   // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
> >>>>   if ((ISD::isSEXTLoad(N0.getNode()) ||
> >>> ISD::isEXTLoad(N0.getNode())) &&
> >>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> >>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> >>> !VT.isVector()) {
> >>>>     LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >>>>     EVT MemVT = LN0->getMemoryVT();
> >>>>     if ((!LegalOperations && !LN0->isVolatile()) ||
> >>>> @@ -4678,7 +4678,7 @@
> >>>>   // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
> >>>>   // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
> >>>>   if ((ISD::isZEXTLoad(N0.getNode()) ||
> >>> ISD::isEXTLoad(N0.getNode())) &&
> >>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> >>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> >>> !VT.isVector()) {
> >>>>     LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >>>>     EVT MemVT = LN0->getMemoryVT();
> >>>>     if ((!LegalOperations && !LN0->isVolatile()) ||
> >>>> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h    (revision 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h    (working copy)
> >>>> @@ -287,6 +287,8 @@
> >>>>   SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
> >>>>   SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
> >>>>
> >>>> +  SDValue LowerExtendOp(SDNode *N);
> >>>> +
> >>>>   void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
> >> ISD::CondCode
> >>> Code);
> >>>>
> >>>>   //===-----------------------------------------------------------
> >> --
> >>> -------===//
> >>>> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp    (revision
> >>> 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp    (working
> >>> copy)
> >>>> @@ -733,6 +733,51 @@
> >>>> //  Integer Operand Promotion
> >>>> //===-------------------------------------------------------------
> >> --
> >>> -------===//
> >>>>
> >>>> +// LowerExtendOp - This function breaks up vector zext/sext nodes
> >> so
> >>> that the
> >>>> +// result will more closely resemble the operations available on
> >> the
> >>> target.
> >>>> +// The resulting nodes can be each combined with other nodes to
> >>> create legal
> >>>> +// operations.
> >>>
> >>> Some addition questions below, but in general, this could probably
> >> use
> >>> a bit more description in the comment about what's actually
> happening
> >>> here. Why this transform is a good idea and exactly what the
> >> transform
> >>> is, in particular.
> >>>
> >>>> +SDValue DAGTypeLegalizer::LowerExtendOp(SDNode *N) {
> >>>
> >>> Is "Lower" really the right name for this function?
> >>>
> >>>> +  EVT VT0 = N->getValueType(0);
> >>>> +  EVT VT1 = N->getOperand(0).getValueType();
> >>>> +  DebugLoc dl = N->getDebugLoc();
> >>>> +  // We only split the node if the operation is not legal,
> >> otherwise
> >>> we could get
> >>>> +  // a legalization loop.
> >>>> +  if (TLI.getOperationAction(N->getOpcode(), VT1) ==
> >>> TargetLowering::Legal)
> >>>
> >>> You probably just want TLI.isOperationLegalOrCustom() here.
> >>>
> >>>> +    return SDValue();
> >>>> +
> >>>> +  if (!VT0.isVector())
> >>>> +    return SDValue();
> >>>
> >>> Minor note, but I'd put this check first as it's cheaper and more
> >>> likely to fire than the check for the operation being legal.
> >>>
> >>>> +
> >>>> +  // If there is already an instruction that could be directly
> >>>> +  // selected for this there is no reason to expand since it will
> >>>> +  // generate a redundant expand instruction (from type X to type
> >>> X).
> >>>> +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> >>>> +    return SDValue();
> >>>
> >>> I don't follow this. It seems pretty arbitrary. The comment doesn't
> >>> help much, since how do we know what instructions could be selected
> >> or
> >>> not? Can you clarify?
> >>>
> >>>> +
> >>>> +  LLVMContext *Context = DAG.getContext();
> >>>
> >>> Can just have this be a reference since that's how it's used below
> >> and
> >>> we never can have a NULL context here.
> >>>
> >>>> +
> >>>> +  // We need to get the intermediate type. This type will be
> twice
> >>> as
> >>>> +  // small as the destination type,
> >>>
> >>> s/twice as small as/half the size of/
> >>>
> >>>> since all expand instructions are
> >>>> +  // from a length X to a length 2*X and we have expand
> >> instructions
> >>>> +  // for all destination types.
> >>>
> >>> Not sure I understand what you mean by "we have expand instructions
> >> for
> >>> all destination types." Can you clarify?
> >>>
> >>>> +  EVT middleType = EVT::getVectorVT(*Context,
> >>>
> >>> Variable names should start with a Capitol letter. It's just
> function
> >>> names that are camel-case starting w/ lower case. That said, it's
> >>> unclear what "middle" means here. Perhaps a clearer name or a
> >> comment?
> >>>
> >>>> +                          EVT::getIntegerVT(*Context,
> >>>> +
> >> VT0.getScalarType().getSizeInBits()
> >>> / 2),
> >>>> +                          VT0.getVectorNumElements());
> >>>> +
> >>>> +  // We don't want to expand if we can't use a legal intermediate
> >>> type
> >>>> +  // since there wouldn't be any benefit from this and we might
> >> end
> >>>> +  // up with nodes that cannot be selected in the instruction
> >>> selection
> >>>> +  // stage.
> >>>> +  if (!TLI.isTypeLegal(middleType))
> >>>> +    return SDValue();
> >>>> +
> >>>> +  return DAG.getNode(N->getOpcode(), dl, VT0,
> >>>> +                     DAG.getNode(N->getOpcode(), dl,
> >>>> +                                 middleType, N->getOperand(0)));
> >>>> +}
> >>>> +
> >>>> /// PromoteIntegerOperand - This method is called when the
> >> specified
> >>> operand of
> >>>> /// the specified node is found to need promotion.  At this point,
> >>> all of the
> >>>> /// result types of the node are known to be legal, but other
> >>> operands of the
> >>>> @@ -752,7 +797,10 @@
> >>>>   #endif
> >>>>     llvm_unreachable("Do not know how to promote this operator's
> >>> operand!");
> >>>>
> >>>> -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N);
> break;
> >>>> +  case ISD::ANY_EXTEND:
> >>>> +    Res = LowerExtendOp(N);
> >>>> +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
> >>>> +    break;
> >>>>   case ISD::ATOMIC_STORE:
> >>>>     Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
> >>>>     break;
> >>>> @@ -774,15 +822,20 @@
> >>>>   case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo);
> >> break;
> >>>>   case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N, OpNo);
> >>> break;
> >>>>   case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo);
> >> break;
> >>>> -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N);
> >> break;
> >>>> +  case ISD::SIGN_EXTEND:
> >>>> +    Res = LowerExtendOp(N);
> >>>> +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
> >>>> +    break;
> >>>>   case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N); break;
> >>>>   case ISD::STORE:        Res =
> >>> PromoteIntOp_STORE(cast<StoreSDNode>(N),
> >>>>                                                    OpNo); break;
> >>>>   case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N); break;
> >>>>   case ISD::FP16_TO_FP32:
> >>>>   case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N); break;
> >>>> -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N);
> >> break;
> >>>> -
> >>>> +  case ISD::ZERO_EXTEND:
> >>>> +    Res = LowerExtendOp(N);
> >>>> +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
> >>>> +    break;
> >>>>   case ISD::SHL:
> >>>>   case ISD::SRA:
> >>>>   case ISD::SRL:
> >>>> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
> >>>>
> ===================================================================
> >>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (revision 164117)
> >>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (working copy)
> >>>> @@ -2565,7 +2565,8 @@
> >>>>             VT.getVectorNumElements() ==
> >>>>             Operand.getValueType().getVectorNumElements()) &&
> >>>>            "Vector element count mismatch!");
> >>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> >>> ISD::ZERO_EXTEND)
> >>>> +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> >>> ISD::ZERO_EXTEND) &&
> >>>> +       TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >>>>       return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >>>> getOperand(0));
> >>>>     else if (OpOpcode == ISD::UNDEF)
> >>>>       // sext(undef) = 0, because the top bits will all be the
> >> same.
> >>>> @@ -2581,7 +2582,9 @@
> >>>>             VT.getVectorNumElements() ==
> >>>>             Operand.getValueType().getVectorNumElements()) &&
> >>>>            "Vector element count mismatch!");
> >>>> -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) ->
> >> (zext
> >>> x)
> >>>> +    if (OpOpcode == ISD::ZERO_EXTEND &&
> >>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >>>> +      // (zext (zext x)) -> (zext x)
> >>>>       return getNode(ISD::ZERO_EXTEND, DL, VT,
> >>>>                      Operand.getNode()->getOperand(0));
> >>>>     else if (OpOpcode == ISD::UNDEF)
> >>>> @@ -2599,8 +2602,9 @@
> >>>>             Operand.getValueType().getVectorNumElements()) &&
> >>>>            "Vector element count mismatch!");
> >>>>
> >>>> -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> >> ISD::SIGN_EXTEND
> >>> ||
> >>>> -        OpOpcode == ISD::ANY_EXTEND)
> >>>> +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> >>> ISD::SIGN_EXTEND ||
> >>>> +        OpOpcode == ISD::ANY_EXTEND) &&
> >>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >>>>       // (ext (zext x)) -> (zext x)  and  (ext (sext x)) -> (sext
> >> x)
> >>>>       return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >>>> getOperand(0));
> >>>>     else if (OpOpcode == ISD::UNDEF)
> >>>
> >>> On Oct 1, 2012, at 8:09 AM, Silviu Baranga <silbar01 at arm.com>
> wrote:
> >>>
> >>>> Ping.
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Silviu Baranga [mailto:silbar01 at arm.com]
> >>>>> Sent: 20 September 2012 15:48
> >>>>> To: 'Jim Grosbach'
> >>>>> Cc: llvm-commits at cs.uiuc.edu
> >>>>> Subject: RE: [llvm-commits] [patch] Enable the generation of
> >> vaddl-
> >>> type
> >>>>> instructions from extloads on ARM
> >>>>>
> >>>>> Hi Jim,
> >>>>>
> >>>>> Thanks for the comments! I am attaching a new patch with the
> >>>>> changes applied.
> >>>>>
> >>>>> To make the code target-independent I've added the check that
> >>>>> the 'middleType' is legal, which is equivalent to the previous
> >>>>> ARM specific check. I didn't check if the operation is legal,
> >>>>> since if the code got there it is marked as 'Expand' anyway.
> >>>>>
> >>>>> Cheers,
> >>>>> Silviu
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
> >>>>>> Sent: 19 September 2012 17:56
> >>>>>> To: Silviu Baranga
> >>>>>> Cc: llvm-commits at cs.uiuc.edu
> >>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
> >> vaddl-
> >>>>> type
> >>>>>> instructions from extloads on ARM
> >>>>>>
> >>>>>> Hi Silviu,
> >>>>>>
> >>>>>> Making good progress! A few comments on the updated patch
> inline.
> >>>>>>
> >>>>>>> Index: test/CodeGen/ARM/vector-extend-narrow.ll
> >>>>>>>
> >>> ===================================================================
> >>>>>>> --- test/CodeGen/ARM/vector-extend-narrow.ll    (revision
> 164117)
> >>>>>>> +++ test/CodeGen/ARM/vector-extend-narrow.ll    (working copy)
> >>>>>>> @@ -73,3 +73,68 @@
> >>>>>>>  ret <4 x i32> %2
> >>>>>>> }
> >>>>>>>
> >>>>>>> +; CHECK: k:
> >>>>>>> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
> >>>>>>> +  ; CHECK: vld1.32
> >>>>>>> +  ; CHECK: vmovl.u8
> >>>>>>> +  ; CHECK: vaddw.u16
> >>>>>>> +  %1 = load <4 x i8>* %in, align 4
> >>>>>>> +  %2 = zext <4 x i8> %1 to <4 x i32>
> >>>>>>> +  %3 = add <4 x i32> %2, %x
> >>>>>>> +  ret <4 x i32> %3
> >>>>>>> +}
> >>>>>>> +; CHECK: l:
> >>>>>>> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
> >>>>>>> +  ; CHECK: vld1.16
> >>>>>>> +  ; CHECK: vmovl.u8
> >>>>>>> +  ; CHECK: vmovl.u16
> >>>>>>> +  ; CHECK: vaddw.u32
> >>>>>>> +  %1 = load <2 x i8>* %in, align 4
> >>>>>>> +  %2 = zext <2 x i8> %1 to <2 x i64>
> >>>>>>> +  %3 = add <2 x i64> %2, %x
> >>>>>>> +  ret <2 x i64> %3
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +; CHECK: m:
> >>>>>>> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
> >>>>>>> +  ; CHECK: vld1.32
> >>>>>>> +  ; CHECK: vmovl.u16
> >>>>>>> +  ; CHECK: vaddw.u32
> >>>>>>> +  %1 = load <2 x i16>* %in, align 4
> >>>>>>> +  %2 = zext <2 x i16> %1 to <2 x i64>
> >>>>>>> +  %3 = add <2 x i64> %2, %x
> >>>>>>> +  ret <2 x i64> %3
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +; CHECK: n:
> >>>>>>> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
> >>>>>>> +  ; CHECK: vld1.32
> >>>>>>> +  ; CHECK: vmovl.s8
> >>>>>>> +  ; CHECK: vaddw.s16
> >>>>>>> +  %1 = load <4 x i8>* %in, align 4
> >>>>>>> +  %2 = sext <4 x i8> %1 to <4 x i32>
> >>>>>>> +  %3 = add <4 x i32> %2, %x
> >>>>>>> +  ret <4 x i32> %3
> >>>>>>> +}
> >>>>>>> +; CHECK: o:
> >>>>>>> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
> >>>>>>> +  ; CHECK: vld1.16
> >>>>>>> +  ; CHECK: vmovl.s8
> >>>>>>> +  ; CHECK: vmovl.s16
> >>>>>>> +  ; CHECK: vaddw.s32
> >>>>>>> +  %1 = load <2 x i8>* %in, align 4
> >>>>>>> +  %2 = sext <2 x i8> %1 to <2 x i64>
> >>>>>>> +  %3 = add <2 x i64> %2, %x
> >>>>>>> +  ret <2 x i64> %3
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +; CHECK: p:
> >>>>>>> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
> >>>>>>> +  ; CHECK: vld1.32
> >>>>>>> +  ; CHECK: vmovl.s16
> >>>>>>> +  ; CHECK: vaddw.s32
> >>>>>>> +  %1 = load <2 x i16>* %in, align 4
> >>>>>>> +  %2 = sext <2 x i16> %1 to <2 x i64>
> >>>>>>> +  %3 = add <2 x i64> %2, %x
> >>>>>>> +  ret <2 x i64> %3
> >>>>>>> +}
> >>>>>>> Index: lib/Target/ARM/ARMISelLowering.cpp
> >>>>>>>
> >>> ===================================================================
> >>>>>>> --- lib/Target/ARM/ARMISelLowering.cpp    (revision 164117)
> >>>>>>> +++ lib/Target/ARM/ARMISelLowering.cpp    (working copy)
> >>>>>>> @@ -565,6 +570,12 @@
> >>>>>>>      setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
> >>>>>>>      setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
> >>>>>>>    }
> >>>>>>> +
> >>>>>>> +    for (unsigned i = 0; i < 6; ++i) {
> >>>>>>> +      setOperationAction(ISD::ZERO_EXTEND, Tys[i], Expand);
> >>>>>>> +      setOperationAction(ISD::SIGN_EXTEND, Tys[i], Expand);
> >>>>>>> +      setOperationAction(ISD::ANY_EXTEND, Tys[i], Expand);
> >>>>>>> +    }
> >>>>>>>  }
> >>>>>>
> >>>>>> I'd have sworn I commented about this bit before, but I can't
> >> find
> >>> it
> >>>>>> in my email logs.
> >>>>>>
> >>>>>> Anyways, This, and the loop above, should be moved out to a
> >> helper
> >>>>>> function and the loop removed. That is, the helper should
> perform
> >>> the
> >>>>>> setOperationAction() calls for a single type, then the top level
> >>>>>> function just calls that helper 6 times, once for each MVT. The
> >>> NEON
> >>>>>> bits for setting legal types do something similar.
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>  // ARM and Thumb2 support UMLAL/SMLAL.
> >>>>>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >>>>>>>
> >>> ===================================================================
> >>>>>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (revision
> 164117)
> >>>>>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (working copy)
> >>>>>>> @@ -4342,7 +4342,7 @@
> >>>>>>>  // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
> >>>>>>>  // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
> >>>>>>>  if ((ISD::isSEXTLoad(N0.getNode()) ||
> >>>>>> ISD::isEXTLoad(N0.getNode())) &&
> >>>>>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> >>>>>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> >>>>>> !VT.isVector()) {
> >>>>>>>    LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >>>>>>>    EVT MemVT = LN0->getMemoryVT();
> >>>>>>>    if ((!LegalOperations && !LN0->isVolatile()) ||
> >>>>>>> @@ -4678,7 +4678,7 @@
> >>>>>>>  // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
> >>>>>>>  // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
> >>>>>>>  if ((ISD::isZEXTLoad(N0.getNode()) ||
> >>>>>> ISD::isEXTLoad(N0.getNode())) &&
> >>>>>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> >>>>>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> >>>>>> !VT.isVector()) {
> >>>>>>>    LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> >>>>>>>    EVT MemVT = LN0->getMemoryVT();
> >>>>>>>    if ((!LegalOperations && !LN0->isVolatile()) ||
> >>>>>>> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
> >>>>>>>
> >>> ===================================================================
> >>>>>>> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h    (revision
> 164117)
> >>>>>>> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h    (working copy)
> >>>>>>> @@ -287,6 +287,8 @@
> >>>>>>>  SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
> >>>>>>>  SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
> >>>>>>>
> >>>>>>> +  SDValue ExpandExtend(SDNode *N);
> >>>>>>> +
> >>>>>>>  void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
> >>>>> ISD::CondCode
> >>>>>> Code);
> >>>>>>>
> >>>>>>>  //===---------------------------------------------------------
> >> --
> >>>>> --
> >>>>>> -------===//
> >>>>>>> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> >>>>>>>
> >>> ===================================================================
> >>>>>>> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> >>    (revision
> >>>>>> 164117)
> >>>>>>> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> >>    (working
> >>>>>> copy)
> >>>>>>> @@ -732,7 +732,44 @@
> >>>>>>> //===----------------------------------------------------------
> -
> >> --
> >>>>> --
> >>>>>> -------===//
> >>>>>>> //  Integer Operand Promotion
> >>>>>>> //===----------------------------------------------------------
> -
> >> --
> >>>>> --
> >>>>>> -------===//
> >>>>>>> +SDValue DAGTypeLegalizer::ExpandExtend(SDNode *N) {
> >>>>>>
> >>>>>> This function really needs a comment explaining its purpose and
> >>>>> perhaps
> >>>>>> a different name. It's not really doing a full expand of the
> >>> extend,
> >>>>>> but rather doing some housekeeping beforehand to make sure the
> >>>>>> expansion that results from the Promote*() call will be
> >> reasonable
> >>>>> for
> >>>>>> the target. In any case, some comments about the motivation and
> >>> logic
> >>>>>> would be very helpful.
> >>>>>>
> >>>>>>> +  EVT VT0 = N->getValueType(0);
> >>>>>>> +  EVT VT1 = N->getOperand(0).getValueType();
> >>>>>>> +  DebugLoc dl = N->getDebugLoc();
> >>>>>>> +  // We only split the node if the operation is not legal,
> >>>>> otherwise
> >>>>>> we could get
> >>>>>>> +  // a legalization loop.
> >>>>>>> +  if (TLI.getOperationAction(N->getOpcode(), VT1) ==
> >>>>>> TargetLowering::Legal)
> >>>>>>> +    return SDValue();
> >>>>>>>
> >>>>>>> +  if (!VT0.isVector())
> >>>>>>> +    return SDValue();
> >>>>>>> +
> >>>>>>> +  // We don't want expand if the intermediate type is the
> >> v2i16,
> >>>>>>> +  // since v2i16 is not a legal type. This only happens when
> >> the
> >>>>>>> +  // destination has the v2i32 type and the source is v2i8.
> >>>>>>> +  if (VT1 == MVT::v2i8 && VT0 == MVT::v2i32)
> >>>>>>> +    return SDValue();
> >>>>>>
> >>>>>> This is in generic code now, so we can't make assumptions about
> >>> what
> >>>>>> types are legal. We need to ask the target. This function needs
> >> to
> >>> be
> >>>>>> made more generic, basically. For example, here we assume that
> >>> v2i16
> >>>>>> isn't legal. That's true for ARM, but quite possibly untrue for
> >>> other
> >>>>>> targets that support 32-bit vector types. Likewise, we may need
> >> to
> >>>>>> check for which operations are legal as well as which types.
> >>>>>>
> >>>>>>> +  // If there is already an instruction that could be directly
> >>>>>>> +  // selected for this there is no reason to expand since it
> >> will
> >>>>>>> +  // generate a redundant expand instruction (from type X to
> >> type
> >>>>>> X).
> >>>>>>> +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> >>>>>>> +    return SDValue();
> >>>>>>> +
> >>>>>>> +  LLVMContext *Context = DAG.getContext();
> >>>>>>> +
> >>>>>>> +  // We need to get the intermediate type. This type will be
> >>> twice
> >>>>>> as
> >>>>>>> +  // small as the destination type, since all expand
> >> instructions
> >>>>>> are
> >>>>>>> +  // from a length X to a length 2*X and we have expand
> >>>>> instructions
> >>>>>>> +  // for all destination types.
> >>>>>>> +  EVT middleType = EVT::getVectorVT(*Context,
> >>>>>>> +                          EVT::getIntegerVT(*Context,
> >>>>>>> +
> >>>>> VT0.getScalarType().getSizeInBits()
> >>>>>> / 2),
> >>>>>>> +                          VT0.getVectorNumElements());
> >>>>>>> +  return DAG.getNode(N->getOpcode(), dl, VT0,
> >>>>>>> +                     DAG.getNode(N->getOpcode(), dl,
> >>>>>>> +                                 middleType, N-
> >>> getOperand(0)));
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> /// PromoteIntegerOperand - This method is called when the
> >>>>> specified
> >>>>>> operand of
> >>>>>>> /// the specified node is found to need promotion.  At this
> >> point,
> >>>>>> all of the
> >>>>>>> /// result types of the node are known to be legal, but other
> >>>>>> operands of the
> >>>>>>> @@ -752,7 +789,10 @@
> >>>>>>>  #endif
> >>>>>>>    llvm_unreachable("Do not know how to promote this operator's
> >>>>>> operand!");
> >>>>>>>
> >>>>>>> -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N);
> >>> break;
> >>>>>>> +  case ISD::ANY_EXTEND:
> >>>>>>> +    Res = ExpandExtend(N);
> >>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
> >>>>>>> +    break;
> >>>>>>>  case ISD::ATOMIC_STORE:
> >>>>>>>    Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
> >>>>>>>    break;
> >>>>>>> @@ -774,15 +814,20 @@
> >>>>>>>  case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo);
> >>>>> break;
> >>>>>>>  case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N, OpNo);
> >>>>>> break;
> >>>>>>>  case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo);
> >>>>> break;
> >>>>>>> -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N);
> >>>>> break;
> >>>>>>> +  case ISD::SIGN_EXTEND:
> >>>>>>> +    Res = ExpandExtend(N);
> >>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
> >>>>>>> +    break;
> >>>>>>>  case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N);
> >> break;
> >>>>>>>  case ISD::STORE:        Res =
> >>>>>> PromoteIntOp_STORE(cast<StoreSDNode>(N),
> >>>>>>>                                                   OpNo); break;
> >>>>>>>  case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N); break;
> >>>>>>>  case ISD::FP16_TO_FP32:
> >>>>>>>  case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N);
> >> break;
> >>>>>>> -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N);
> >>>>> break;
> >>>>>>> -
> >>>>>>> +  case ISD::ZERO_EXTEND:
> >>>>>>> +    Res = ExpandExtend(N);
> >>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
> >>>>>>> +    break;
> >>>>>>>  case ISD::SHL:
> >>>>>>>  case ISD::SRA:
> >>>>>>>  case ISD::SRL:
> >>>>>>> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
> >>>>>>>
> >>> ===================================================================
> >>>>>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (revision
> 164117)
> >>>>>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (working copy)
> >>>>>>> @@ -2565,7 +2565,8 @@
> >>>>>>>            VT.getVectorNumElements() ==
> >>>>>>>            Operand.getValueType().getVectorNumElements()) &&
> >>>>>>>           "Vector element count mismatch!");
> >>>>>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> >>>>>> ISD::ZERO_EXTEND)
> >>>>>>> +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> >>>>>> ISD::ZERO_EXTEND) &&
> >>>>>>> +       TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> >>>>>>>      return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >>>>>>> getOperand(0));
> >>>>>>>    else if (OpOpcode == ISD::UNDEF)
> >>>>>>>      // sext(undef) = 0, because the top bits will all be the
> >>>>> same.
> >>>>>>> @@ -2581,7 +2582,9 @@
> >>>>>>>            VT.getVectorNumElements() ==
> >>>>>>>            Operand.getValueType().getVectorNumElements()) &&
> >>>>>>>           "Vector element count mismatch!");
> >>>>>>> -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) ->
> >>>>> (zext
> >>>>>> x)
> >>>>>>> +    if (OpOpcode == ISD::ZERO_EXTEND &&
> >>>>>>> +        TLI.isOperationLegal(OpOpcode,
> Operand.getValueType()))
> >>>>>>> +      // (zext (zext x)) -> (zext x)
> >>>>>>>      return getNode(ISD::ZERO_EXTEND, DL, VT,
> >>>>>>>                     Operand.getNode()->getOperand(0));
> >>>>>>>    else if (OpOpcode == ISD::UNDEF)
> >>>>>>> @@ -2599,8 +2602,9 @@
> >>>>>>>            Operand.getValueType().getVectorNumElements()) &&
> >>>>>>>           "Vector element count mismatch!");
> >>>>>>>
> >>>>>>> -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> >>>>> ISD::SIGN_EXTEND
> >>>>>> ||
> >>>>>>> -        OpOpcode == ISD::ANY_EXTEND)
> >>>>>>> +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> >>>>>> ISD::SIGN_EXTEND ||
> >>>>>>> +        OpOpcode == ISD::ANY_EXTEND) &&
> >>>>>>> +        TLI.isOperationLegal(OpOpcode,
> Operand.getValueType()))
> >>>>>>>      // (ext (zext x)) -> (zext x)  and  (ext (sext x)) ->
> >> (sext
> >>>>> x)
> >>>>>>>      return getNode(OpOpcode, DL, VT, Operand.getNode()-
> >>>>>>> getOperand(0));
> >>>>>>>    else if (OpOpcode == ISD::UNDEF)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Sep 19, 2012, at 3:47 AM, Silviu Baranga <silbar01 at arm.com>
> >>> wrote:
> >>>>>>
> >>>>>>> Hi Jim,
> >>>>>>>
> >>>>>>> Thanks for the review. It does make sense to do this in a
> target
> >>>>>>> independent way. I've attached a new patch that does this.
> >>>>>>>
> >>>>>>> I've moved the code ARM-specific code into the legalizer and
> >> also
> >>>>>>> added tests for sext nodes. It appears that I was mistaken
> >> before
> >>>>>>> and (sext (sextload)) patterns were not prevented from folding
> >> for
> >>>>>>> vector types. This was stopping the generation of vaddl-type
> >>>>>>> instructions. I think that stopping the folding of (sext
> >>>>> (sextload))
> >>>>>>> patterns is still OK, because the merging would result in a
> >>> illegal
> >>>>>>> operation.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Silviu
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
> >>>>>>>> Sent: 17 September 2012 18:09
> >>>>>>>> To: Silviu Baranga
> >>>>>>>> Cc: llvm-commits at cs.uiuc.edu
> >>>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
> >>>>> vaddl-
> >>>>>> type
> >>>>>>>> instructions from extloads on ARM
> >>>>>>>>
> >>>>>>>> Hi Silviu,
> >>>>>>>>
> >>>>>>>> Thanks for the update. This is definitely much better. Why do
> >> we
> >>>>>> need
> >>>>>>>> any ARM specific code here, though? Given the problem
> >>> description,
> >>>>>> it
> >>>>>>>> seems like this is something the legalizer should be able to
> >>>>> reason
> >>>>>>>> about without needing custom handling in the target.
> >>>>>>>>
> >>>>>>>> -Jim
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sep 6, 2012, at 9:34 AM, Silviu Baranga <silbar01 at arm.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Jim,
> >>>>>>>>>
> >>>>>>>>> Thanks for the review. The BREAK node solution was a bit
> ugly.
> >>>>>>>>>
> >>>>>>>>> I've attached a new patch that solves this without a BREAK
> >> node.
> >>>>>>>>> This is done by stopping the folding of nodes zext nodes when
> >>>>>>>>> this will result in a non-legal operation.
> >>>>>>>>>
> >>>>>>>>> I also had to stop the (zext (zextload x)) folding for vector
> >>>>>>>>> types. This is already being done for sext nodes.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Silviu
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
> >>>>>>>>>> Sent: 06 September 2012 01:42
> >>>>>>>>>> To: Silviu Baranga
> >>>>>>>>>> Cc: llvm-commits at cs.uiuc.edu
> >>>>>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
> >>>>>> vaddl-
> >>>>>>>> type
> >>>>>>>>>> instructions from extloads on ARM
> >>>>>>>>>>
> >>>>>>>>>> Hi Silviu,
> >>>>>>>>>>
> >>>>>>>>>> I like the idea of teaching the compiler to use these
> >>>>>> instructions,
> >>>>>>>> but
> >>>>>>>>>> the BREAK pseudo-instruction feels pretty gross to me. I'd
> >> much
> >>>>>>>> prefer
> >>>>>>>>>> we figure out how to teach the combiner and isel to be
> >> smarter
> >>>>>>>> directly
> >>>>>>>>>> rather than doing something like that.
> >>>>>>>>>>
> >>>>>>>>>> -Jim
> >>>>>>>>>>
> >>>>>>>>>> On Sep 4, 2012, at 1:58 AM, Silviu Baranga
> <silbar01 at arm.com>
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> Currently LLVM will not generate a vaddl-type instructions
> >>> when
> >>>>>> an
> >>>>>>>>>> extload
> >>>>>>>>>>> node is in the input dag. All vector extload nodes are
> >>>>>> implemented
> >>>>>>>>>> with
> >>>>>>>>>>> a load instruction and a lengthening instruction on ARM. If
> >>> the
> >>>>>>>>>> lengthening
> >>>>>>>>>>> instruction is followed by an add instruction, it would be
> >>>>> better
> >>>>>>>> to
> >>>>>>>>>>> generate
> >>>>>>>>>>> a vaddl-type instruction instead.
> >>>>>>>>>>>
> >>>>>>>>>>> To do this, we split all lengthening dag nodes which extend
> >> a
> >>>>>> type
> >>>>>>>> to
> >>>>>>>>>> more
> >>>>>>>>>>> then twice its size (this is OK because these nodes would
> be
> >>>>>>>>>> implemented by
> >>>>>>>>>>> multiple instructions anyway) into two lengthening nodes.
> >> One
> >>>>>> node
> >>>>>>>>>> will be
> >>>>>>>>>>> later
> >>>>>>>>>>> combined to form the extload node, and the other can be
> >>>>> selected
> >>>>>> as
> >>>>>>>>>> part of
> >>>>>>>>>>> a
> >>>>>>>>>>> vaddl-type instruction (there are already patterns for
> >> this).
> >>>>>>>>>>>
> >>>>>>>>>>> To stop the dag nodes from re-combining, we insert a BREAK
> >>> node
> >>>>>>>>>> between
> >>>>>>>>>>> them. The
> >>>>>>>>>>> BREAK node is used as an optimization barrier for the DAG
> >>>>>> combiner,
> >>>>>>>>>> and
> >>>>>>>>>>> might be
> >>>>>>>>>>> useful in other cases as well. We remove the BREAK node in
> >> the
> >>>>>>>>>> instruction
> >>>>>>>>>>> selection stage.
> >>>>>>>>>>>
> >>>>>>>>>>> This approach allows the reuse of existing selection
> >> patterns.
> >>>>>>>>>>>
> >>>>>>>>>>> This patch depends on the following other patch:
> >>>>>>>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-
> Mon-
> >>>>>>>>>> 20120827/149201.
> >>>>>>>>>>> html
> >>>>>>>>>>>
> >>>>>>>>>>> Please review!
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>>
> >>>>> Silviu<patch.diff>_______________________________________________
> >>>>>>>>>>> llvm-commits mailing list
> >>>>>>>>>>> llvm-commits at cs.uiuc.edu
> >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> <zext_split.diff>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> <zext_split.diff>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >
> >
> >
> >
>


-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zext.diff
Type: application/octet-stream
Size: 13045 bytes
Desc: zext.diff
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121207/4f38c93d/attachment.obj>


More information about the llvm-commits mailing list