[llvm-commits] [patch] Enable the generation of vaddl-type instructions from extloads on ARM

Fri Dec 7 08:39:07 PST 2012

Can you attach the patch?

Thanks,

Evan

Sent from my iPad

On Dec 7, 2012, at 4:33 AM, Silviu Baranga <silbar01 at arm.com> wrote:

> Ping!
> 
> --Silviu
> 
>> -----Original Message-----
>> From: Silviu Baranga [mailto:silbar01 at arm.com]
>> Sent: 26 November 2012 16:38
>> To: 'Jim Grosbach'
>> Cc: llvm-commits at cs.uiuc.edu for LLVM; Evan Cheng
>> Subject: RE: [llvm-commits] [patch] Enable the generation of vaddl-type
>> instructions from extloads on ARM
>> 
>> Hi Jim,
>> 
>> I finally got around to re-writing this patch. Sorry for taking
>> so long.
>> 
>> I've refactored this patch to address your previous comments.
>> The code now uses "isOperationLegalOrCustom" to decide if it
>> can introduce the new zext/sext node for the intermediary type.
>> 
>> On the ARM side I've set the zext action for 64 bit vectors to legal,
>> and everything else to expand. All extload-type vector actions
>> are also set to expand now, since there are no instructions that
>> implement this.
>> 
>> As for the "LowerExtendOp" name, I'm open to suggestions.
>> 
>> This also fixes PR11570.
>> 
>> Cheers,
>> Silviu
>> 
>>> -----Original Message-----
>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>> Sent: 02 October 2012 21:54
>>> To: Silviu Baranga
>>> Cc: llvm-commits at cs.uiuc.edu for LLVM; Evan Cheng
>>> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-
>> type
>>> instructions from extloads on ARM
>>> 
>>> Hi Silviu,
>>> 
>>> Specific comments below.
>>> 
>>> Evan, you're more expert than I in this part of the code. There's
>>> something about this whole thing that feels a bit funny, but I can't
>>> put my finger on it effectively enough to be very helpful. Mind
>> having
>>> a look? It'll probably be obvious one way or the other to you.
>>> 
>>>> Index: test/CodeGen/ARM/vector-extend-narrow.ll
>>> 
>>> General comment: thank you for making sure to include test cases.
>> It's
>>> amazing how many patches don't do that...
>>> 
>>>> ===================================================================
>>>> --- test/CodeGen/ARM/vector-extend-narrow.ll    (revision 164117)
>>>> +++ test/CodeGen/ARM/vector-extend-narrow.ll    (working copy)
>>>> @@ -73,3 +73,68 @@
>>>>   ret <4 x i32> %2
>>>> }
>>>> 
>>>> +; CHECK: k:
>>>> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
>>>> +  ; CHECK: vld1.32
>>>> +  ; CHECK: vmovl.u8
>>>> +  ; CHECK: vaddw.u16
>>>> +  %1 = load <4 x i8>* %in, align 4
>>>> +  %2 = zext <4 x i8> %1 to <4 x i32>
>>>> +  %3 = add <4 x i32> %2, %x
>>>> +  ret <4 x i32> %3
>>>> +}
>>>> +; CHECK: l:
>>>> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
>>>> +  ; CHECK: vld1.16
>>>> +  ; CHECK: vmovl.u8
>>>> +  ; CHECK: vmovl.u16
>>>> +  ; CHECK: vaddw.u32
>>>> +  %1 = load <2 x i8>* %in, align 4
>>>> +  %2 = zext <2 x i8> %1 to <2 x i64>
>>>> +  %3 = add <2 x i64> %2, %x
>>>> +  ret <2 x i64> %3
>>>> +}
>>>> +
>>>> +; CHECK: m:
>>>> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
>>>> +  ; CHECK: vld1.32
>>>> +  ; CHECK: vmovl.u16
>>>> +  ; CHECK: vaddw.u32
>>>> +  %1 = load <2 x i16>* %in, align 4
>>>> +  %2 = zext <2 x i16> %1 to <2 x i64>
>>>> +  %3 = add <2 x i64> %2, %x
>>>> +  ret <2 x i64> %3
>>>> +}
>>>> +
>>>> +; CHECK: n:
>>>> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
>>>> +  ; CHECK: vld1.32
>>>> +  ; CHECK: vmovl.s8
>>>> +  ; CHECK: vaddw.s16
>>>> +  %1 = load <4 x i8>* %in, align 4
>>>> +  %2 = sext <4 x i8> %1 to <4 x i32>
>>>> +  %3 = add <4 x i32> %2, %x
>>>> +  ret <4 x i32> %3
>>>> +}
>>>> +; CHECK: o:
>>>> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
>>>> +  ; CHECK: vld1.16
>>>> +  ; CHECK: vmovl.s8
>>>> +  ; CHECK: vmovl.s16
>>>> +  ; CHECK: vaddw.s32
>>>> +  %1 = load <2 x i8>* %in, align 4
>>>> +  %2 = sext <2 x i8> %1 to <2 x i64>
>>>> +  %3 = add <2 x i64> %2, %x
>>>> +  ret <2 x i64> %3
>>>> +}
>>>> +
>>>> +; CHECK: p:
>>>> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
>>>> +  ; CHECK: vld1.32
>>>> +  ; CHECK: vmovl.s16
>>>> +  ; CHECK: vaddw.s32
>>>> +  %1 = load <2 x i16>* %in, align 4
>>>> +  %2 = sext <2 x i16> %1 to <2 x i64>
>>>> +  %3 = add <2 x i64> %2, %x
>>>> +  ret <2 x i64> %3
>>>> +}
>>>> Index: lib/Target/ARM/ARMISelLowering.h
>>>> ===================================================================
>>>> --- lib/Target/ARM/ARMISelLowering.h    (revision 164117)
>>>> +++ lib/Target/ARM/ARMISelLowering.h    (working copy)
>>>> @@ -403,6 +406,8 @@
>>>>     void addDRTypeForNEON(MVT VT);
>>>>     void addQRTypeForNEON(MVT VT);
>>>> 
>>>> +    void addExtendLegalizeTypeForNEON(MVT VT);
>>>> +
>>>>     typedef SmallVector<std::pair<unsigned, SDValue>, 8>
>>> RegsToPassVector;
>>>>     void PassF64ArgInRegs(DebugLoc dl, SelectionDAG &DAG,
>>>>                           SDValue Chain, SDValue &Arg,
>>>> Index: lib/Target/ARM/ARMISelLowering.cpp
>>>> ===================================================================
>>>> --- lib/Target/ARM/ARMISelLowering.cpp    (revision 164117)
>>>> +++ lib/Target/ARM/ARMISelLowering.cpp    (working copy)
>>>> @@ -158,6 +158,15 @@
>>>>   addTypeForNEON(VT, MVT::v2f64, MVT::v4i32);
>>>> }
>>>> 
>>>> +void ARMTargetLowering::addExtendLegalizeTypeForNEON(MVT VT) {
>>>> +      setLoadExtAction(ISD::EXTLOAD, VT, Legal);
>>>> +      setLoadExtAction(ISD::SEXTLOAD, VT, Legal);
>>>> +      setLoadExtAction(ISD::ZEXTLOAD, VT, Legal);
>>>> +      setOperationAction(ISD::ANY_EXTEND, VT, Expand);
>>>> +      setOperationAction(ISD::SIGN_EXTEND, VT, Expand);
>>>> +      setOperationAction(ISD::ZERO_EXTEND, VT, Expand);
>>>> +}
>>> 
>>> Formatting. Only two space indentation required.
>>> 
>>>> +
>>>> static TargetLoweringObjectFile *createTLOF(TargetMachine &TM) {
>>>>   if (TM.getSubtarget<ARMSubtarget>().isTargetDarwin())
>>>>     return new TargetLoweringObjectFileMachO();
>>>> @@ -556,15 +565,17 @@
>>>>     setTargetDAGCombine(ISD::FP_TO_UINT);
>>>>     setTargetDAGCombine(ISD::FDIV);
>>>> 
>>>> -    // It is legal to extload from v4i8 to v4i16 or v4i32.
>>>> -    MVT Tys[6] = {MVT::v8i8, MVT::v4i8, MVT::v2i8,
>>>> -                  MVT::v4i16, MVT::v2i16,
>>>> -                  MVT::v2i32};
>>>> -    for (unsigned i = 0; i < 6; ++i) {
>>>> -      setLoadExtAction(ISD::EXTLOAD, Tys[i], Legal);
>>>> -      setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
>>>> -      setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
>>>> -    }
>>>> +    // There is no instruction to expand v4i8 to v4i32 and
>>>> +    // we must do this with two vmovl instructions. However,
>>>> +    // the vmovl instructions can be combined with other
>>>> +    // lengthening instructions, so we would prefer to lower
>>>> +    // (zextload x) -> (zext (zextload x).
>>>> +    addExtendLegalizeTypeForNEON(MVT::v8i8);
>>>> +    addExtendLegalizeTypeForNEON(MVT::v4i8);
>>>> +    addExtendLegalizeTypeForNEON(MVT::v2i8);
>>>> +    addExtendLegalizeTypeForNEON(MVT::v4i16);
>>>> +    addExtendLegalizeTypeForNEON(MVT::v2i16);
>>>> +    addExtendLegalizeTypeForNEON(MVT::v2i32);
>>>>   }
>>>> 
>>>>   // ARM and Thumb2 support UMLAL/SMLAL.
>>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>> ===================================================================
>>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (revision 164117)
>>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (working copy)
>>>> @@ -4342,7 +4342,7 @@
>>>>   // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
>>>>   // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
>>>>   if ((ISD::isSEXTLoad(N0.getNode()) ||
>>> ISD::isEXTLoad(N0.getNode())) &&
>>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
>>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
>>> !VT.isVector()) {
>>>>     LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>>>>     EVT MemVT = LN0->getMemoryVT();
>>>>     if ((!LegalOperations && !LN0->isVolatile()) ||
>>>> @@ -4678,7 +4678,7 @@
>>>>   // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
>>>>   // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
>>>>   if ((ISD::isZEXTLoad(N0.getNode()) ||
>>> ISD::isEXTLoad(N0.getNode())) &&
>>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
>>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
>>> !VT.isVector()) {
>>>>     LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>>>>     EVT MemVT = LN0->getMemoryVT();
>>>>     if ((!LegalOperations && !LN0->isVolatile()) ||
>>>> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
>>>> ===================================================================
>>>> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h    (revision 164117)
>>>> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h    (working copy)
>>>> @@ -287,6 +287,8 @@
>>>>   SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
>>>>   SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
>>>> 
>>>> +  SDValue LowerExtendOp(SDNode *N);
>>>> +
>>>>   void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
>> ISD::CondCode
>>> Code);
>>>> 
>>>>   //===-----------------------------------------------------------
>> --
>>> -------===//
>>>> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>>> ===================================================================
>>>> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp    (revision
>>> 164117)
>>>> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp    (working
>>> copy)
>>>> @@ -733,6 +733,51 @@
>>>> //  Integer Operand Promotion
>>>> //===-------------------------------------------------------------
>> --
>>> -------===//
>>>> 
>>>> +// LowerExtendOp - This function breaks up vector zext/sext nodes
>> so
>>> that the
>>>> +// result will more closely resemble the operations available on
>> the
>>> target.
>>>> +// The resulting nodes can be each combined with other nodes to
>>> create legal
>>>> +// operations.
>>> 
>>> Some addition questions below, but in general, this could probably
>> use
>>> a bit more description in the comment about what's actually happening
>>> here. Why this transform is a good idea and exactly what the
>> transform
>>> is, in particular.
>>> 
>>>> +SDValue DAGTypeLegalizer::LowerExtendOp(SDNode *N) {
>>> 
>>> Is "Lower" really the right name for this function?
>>> 
>>>> +  EVT VT0 = N->getValueType(0);
>>>> +  EVT VT1 = N->getOperand(0).getValueType();
>>>> +  DebugLoc dl = N->getDebugLoc();
>>>> +  // We only split the node if the operation is not legal,
>> otherwise
>>> we could get
>>>> +  // a legalization loop.
>>>> +  if (TLI.getOperationAction(N->getOpcode(), VT1) ==
>>> TargetLowering::Legal)
>>> 
>>> You probably just want TLI.isOperationLegalOrCustom() here.
>>> 
>>>> +    return SDValue();
>>>> +
>>>> +  if (!VT0.isVector())
>>>> +    return SDValue();
>>> 
>>> Minor note, but I'd put this check first as it's cheaper and more
>>> likely to fire than the check for the operation being legal.
>>> 
>>>> +
>>>> +  // If there is already an instruction that could be directly
>>>> +  // selected for this there is no reason to expand since it will
>>>> +  // generate a redundant expand instruction (from type X to type
>>> X).
>>>> +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
>>>> +    return SDValue();
>>> 
>>> I don't follow this. It seems pretty arbitrary. The comment doesn't
>>> help much, since how do we know what instructions could be selected
>> or
>>> not? Can you clarify?
>>> 
>>>> +
>>>> +  LLVMContext *Context = DAG.getContext();
>>> 
>>> Can just have this be a reference since that's how it's used below
>> and
>>> we never can have a NULL context here.
>>> 
>>>> +
>>>> +  // We need to get the intermediate type. This type will be twice
>>> as
>>>> +  // small as the destination type,
>>> 
>>> s/twice as small as/half the size of/
>>> 
>>>> since all expand instructions are
>>>> +  // from a length X to a length 2*X and we have expand
>> instructions
>>>> +  // for all destination types.
>>> 
>>> Not sure I understand what you mean by "we have expand instructions
>> for
>>> all destination types." Can you clarify?
>>> 
>>>> +  EVT middleType = EVT::getVectorVT(*Context,
>>> 
>>> Variable names should start with a Capitol letter. It's just function
>>> names that are camel-case starting w/ lower case. That said, it's
>>> unclear what "middle" means here. Perhaps a clearer name or a
>> comment?
>>> 
>>>> +                          EVT::getIntegerVT(*Context,
>>>> +
>> VT0.getScalarType().getSizeInBits()
>>> / 2),
>>>> +                          VT0.getVectorNumElements());
>>>> +
>>>> +  // We don't want to expand if we can't use a legal intermediate
>>> type
>>>> +  // since there wouldn't be any benefit from this and we might
>> end
>>>> +  // up with nodes that cannot be selected in the instruction
>>> selection
>>>> +  // stage.
>>>> +  if (!TLI.isTypeLegal(middleType))
>>>> +    return SDValue();
>>>> +
>>>> +  return DAG.getNode(N->getOpcode(), dl, VT0,
>>>> +                     DAG.getNode(N->getOpcode(), dl,
>>>> +                                 middleType, N->getOperand(0)));
>>>> +}
>>>> +
>>>> /// PromoteIntegerOperand - This method is called when the
>> specified
>>> operand of
>>>> /// the specified node is found to need promotion.  At this point,
>>> all of the
>>>> /// result types of the node are known to be legal, but other
>>> operands of the
>>>> @@ -752,7 +797,10 @@
>>>>   #endif
>>>>     llvm_unreachable("Do not know how to promote this operator's
>>> operand!");
>>>> 
>>>> -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N); break;
>>>> +  case ISD::ANY_EXTEND:
>>>> +    Res = LowerExtendOp(N);
>>>> +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
>>>> +    break;
>>>>   case ISD::ATOMIC_STORE:
>>>>     Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
>>>>     break;
>>>> @@ -774,15 +822,20 @@
>>>>   case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo);
>> break;
>>>>   case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N, OpNo);
>>> break;
>>>>   case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo);
>> break;
>>>> -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N);
>> break;
>>>> +  case ISD::SIGN_EXTEND:
>>>> +    Res = LowerExtendOp(N);
>>>> +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
>>>> +    break;
>>>>   case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N); break;
>>>>   case ISD::STORE:        Res =
>>> PromoteIntOp_STORE(cast<StoreSDNode>(N),
>>>>                                                    OpNo); break;
>>>>   case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N); break;
>>>>   case ISD::FP16_TO_FP32:
>>>>   case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N); break;
>>>> -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N);
>> break;
>>>> -
>>>> +  case ISD::ZERO_EXTEND:
>>>> +    Res = LowerExtendOp(N);
>>>> +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
>>>> +    break;
>>>>   case ISD::SHL:
>>>>   case ISD::SRA:
>>>>   case ISD::SRL:
>>>> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
>>>> ===================================================================
>>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (revision 164117)
>>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (working copy)
>>>> @@ -2565,7 +2565,8 @@
>>>>             VT.getVectorNumElements() ==
>>>>             Operand.getValueType().getVectorNumElements()) &&
>>>>            "Vector element count mismatch!");
>>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
>>> ISD::ZERO_EXTEND)
>>>> +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
>>> ISD::ZERO_EXTEND) &&
>>>> +       TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>>       return getNode(OpOpcode, DL, VT, Operand.getNode()-
>>>> getOperand(0));
>>>>     else if (OpOpcode == ISD::UNDEF)
>>>>       // sext(undef) = 0, because the top bits will all be the
>> same.
>>>> @@ -2581,7 +2582,9 @@
>>>>             VT.getVectorNumElements() ==
>>>>             Operand.getValueType().getVectorNumElements()) &&
>>>>            "Vector element count mismatch!");
>>>> -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) ->
>> (zext
>>> x)
>>>> +    if (OpOpcode == ISD::ZERO_EXTEND &&
>>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>> +      // (zext (zext x)) -> (zext x)
>>>>       return getNode(ISD::ZERO_EXTEND, DL, VT,
>>>>                      Operand.getNode()->getOperand(0));
>>>>     else if (OpOpcode == ISD::UNDEF)
>>>> @@ -2599,8 +2602,9 @@
>>>>             Operand.getValueType().getVectorNumElements()) &&
>>>>            "Vector element count mismatch!");
>>>> 
>>>> -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
>> ISD::SIGN_EXTEND
>>> ||
>>>> -        OpOpcode == ISD::ANY_EXTEND)
>>>> +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
>>> ISD::SIGN_EXTEND ||
>>>> +        OpOpcode == ISD::ANY_EXTEND) &&
>>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>>       // (ext (zext x)) -> (zext x)  and  (ext (sext x)) -> (sext
>> x)
>>>>       return getNode(OpOpcode, DL, VT, Operand.getNode()-
>>>> getOperand(0));
>>>>     else if (OpOpcode == ISD::UNDEF)
>>> 
>>> On Oct 1, 2012, at 8:09 AM, Silviu Baranga <silbar01 at arm.com> wrote:
>>> 
>>>> Ping.
>>>> 
>>>>> -----Original Message-----
>>>>> From: Silviu Baranga [mailto:silbar01 at arm.com]
>>>>> Sent: 20 September 2012 15:48
>>>>> To: 'Jim Grosbach'
>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>> Subject: RE: [llvm-commits] [patch] Enable the generation of
>> vaddl-
>>> type
>>>>> instructions from extloads on ARM
>>>>> 
>>>>> Hi Jim,
>>>>> 
>>>>> Thanks for the comments! I am attaching a new patch with the
>>>>> changes applied.
>>>>> 
>>>>> To make the code target-independent I've added the check that
>>>>> the 'middleType' is legal, which is equivalent to the previous
>>>>> ARM specific check. I didn't check if the operation is legal,
>>>>> since if the code got there it is marked as 'Expand' anyway.
>>>>> 
>>>>> Cheers,
>>>>> Silviu
>>>>> 
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>>>>> Sent: 19 September 2012 17:56
>>>>>> To: Silviu Baranga
>>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
>> vaddl-
>>>>> type
>>>>>> instructions from extloads on ARM
>>>>>> 
>>>>>> Hi Silviu,
>>>>>> 
>>>>>> Making good progress! A few comments on the updated patch inline.
>>>>>> 
>>>>>>> Index: test/CodeGen/ARM/vector-extend-narrow.ll
>>>>>>> 
>>> ===================================================================
>>>>>>> --- test/CodeGen/ARM/vector-extend-narrow.ll    (revision 164117)
>>>>>>> +++ test/CodeGen/ARM/vector-extend-narrow.ll    (working copy)
>>>>>>> @@ -73,3 +73,68 @@
>>>>>>>  ret <4 x i32> %2
>>>>>>> }
>>>>>>> 
>>>>>>> +; CHECK: k:
>>>>>>> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
>>>>>>> +  ; CHECK: vld1.32
>>>>>>> +  ; CHECK: vmovl.u8
>>>>>>> +  ; CHECK: vaddw.u16
>>>>>>> +  %1 = load <4 x i8>* %in, align 4
>>>>>>> +  %2 = zext <4 x i8> %1 to <4 x i32>
>>>>>>> +  %3 = add <4 x i32> %2, %x
>>>>>>> +  ret <4 x i32> %3
>>>>>>> +}
>>>>>>> +; CHECK: l:
>>>>>>> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
>>>>>>> +  ; CHECK: vld1.16
>>>>>>> +  ; CHECK: vmovl.u8
>>>>>>> +  ; CHECK: vmovl.u16
>>>>>>> +  ; CHECK: vaddw.u32
>>>>>>> +  %1 = load <2 x i8>* %in, align 4
>>>>>>> +  %2 = zext <2 x i8> %1 to <2 x i64>
>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>> +  ret <2 x i64> %3
>>>>>>> +}
>>>>>>> +
>>>>>>> +; CHECK: m:
>>>>>>> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
>>>>>>> +  ; CHECK: vld1.32
>>>>>>> +  ; CHECK: vmovl.u16
>>>>>>> +  ; CHECK: vaddw.u32
>>>>>>> +  %1 = load <2 x i16>* %in, align 4
>>>>>>> +  %2 = zext <2 x i16> %1 to <2 x i64>
>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>> +  ret <2 x i64> %3
>>>>>>> +}
>>>>>>> +
>>>>>>> +; CHECK: n:
>>>>>>> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
>>>>>>> +  ; CHECK: vld1.32
>>>>>>> +  ; CHECK: vmovl.s8
>>>>>>> +  ; CHECK: vaddw.s16
>>>>>>> +  %1 = load <4 x i8>* %in, align 4
>>>>>>> +  %2 = sext <4 x i8> %1 to <4 x i32>
>>>>>>> +  %3 = add <4 x i32> %2, %x
>>>>>>> +  ret <4 x i32> %3
>>>>>>> +}
>>>>>>> +; CHECK: o:
>>>>>>> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
>>>>>>> +  ; CHECK: vld1.16
>>>>>>> +  ; CHECK: vmovl.s8
>>>>>>> +  ; CHECK: vmovl.s16
>>>>>>> +  ; CHECK: vaddw.s32
>>>>>>> +  %1 = load <2 x i8>* %in, align 4
>>>>>>> +  %2 = sext <2 x i8> %1 to <2 x i64>
>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>> +  ret <2 x i64> %3
>>>>>>> +}
>>>>>>> +
>>>>>>> +; CHECK: p:
>>>>>>> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
>>>>>>> +  ; CHECK: vld1.32
>>>>>>> +  ; CHECK: vmovl.s16
>>>>>>> +  ; CHECK: vaddw.s32
>>>>>>> +  %1 = load <2 x i16>* %in, align 4
>>>>>>> +  %2 = sext <2 x i16> %1 to <2 x i64>
>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>> +  ret <2 x i64> %3
>>>>>>> +}
>>>>>>> Index: lib/Target/ARM/ARMISelLowering.cpp
>>>>>>> 
>>> ===================================================================
>>>>>>> --- lib/Target/ARM/ARMISelLowering.cpp    (revision 164117)
>>>>>>> +++ lib/Target/ARM/ARMISelLowering.cpp    (working copy)
>>>>>>> @@ -565,6 +570,12 @@
>>>>>>>      setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
>>>>>>>      setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
>>>>>>>    }
>>>>>>> +
>>>>>>> +    for (unsigned i = 0; i < 6; ++i) {
>>>>>>> +      setOperationAction(ISD::ZERO_EXTEND, Tys[i], Expand);
>>>>>>> +      setOperationAction(ISD::SIGN_EXTEND, Tys[i], Expand);
>>>>>>> +      setOperationAction(ISD::ANY_EXTEND, Tys[i], Expand);
>>>>>>> +    }
>>>>>>>  }
>>>>>> 
>>>>>> I'd have sworn I commented about this bit before, but I can't
>> find
>>> it
>>>>>> in my email logs.
>>>>>> 
>>>>>> Anyways, This, and the loop above, should be moved out to a
>> helper
>>>>>> function and the loop removed. That is, the helper should perform
>>> the
>>>>>> setOperationAction() calls for a single type, then the top level
>>>>>> function just calls that helper 6 times, once for each MVT. The
>>> NEON
>>>>>> bits for setting legal types do something similar.
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>  // ARM and Thumb2 support UMLAL/SMLAL.
>>>>>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>>>>> 
>>> ===================================================================
>>>>>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (revision 164117)
>>>>>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (working copy)
>>>>>>> @@ -4342,7 +4342,7 @@
>>>>>>>  // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
>>>>>>>  // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
>>>>>>>  if ((ISD::isSEXTLoad(N0.getNode()) ||
>>>>>> ISD::isEXTLoad(N0.getNode())) &&
>>>>>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
>>>>>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
>>>>>> !VT.isVector()) {
>>>>>>>    LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>>>>>>>    EVT MemVT = LN0->getMemoryVT();
>>>>>>>    if ((!LegalOperations && !LN0->isVolatile()) ||
>>>>>>> @@ -4678,7 +4678,7 @@
>>>>>>>  // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
>>>>>>>  // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
>>>>>>>  if ((ISD::isZEXTLoad(N0.getNode()) ||
>>>>>> ISD::isEXTLoad(N0.getNode())) &&
>>>>>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
>>>>>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
>>>>>> !VT.isVector()) {
>>>>>>>    LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>>>>>>>    EVT MemVT = LN0->getMemoryVT();
>>>>>>>    if ((!LegalOperations && !LN0->isVolatile()) ||
>>>>>>> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
>>>>>>> 
>>> ===================================================================
>>>>>>> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h    (revision 164117)
>>>>>>> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h    (working copy)
>>>>>>> @@ -287,6 +287,8 @@
>>>>>>>  SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
>>>>>>>  SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
>>>>>>> 
>>>>>>> +  SDValue ExpandExtend(SDNode *N);
>>>>>>> +
>>>>>>>  void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
>>>>> ISD::CondCode
>>>>>> Code);
>>>>>>> 
>>>>>>>  //===---------------------------------------------------------
>> --
>>>>> --
>>>>>> -------===//
>>>>>>> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>>>>>> 
>>> ===================================================================
>>>>>>> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>    (revision
>>>>>> 164117)
>>>>>>> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>    (working
>>>>>> copy)
>>>>>>> @@ -732,7 +732,44 @@
>>>>>>> //===-----------------------------------------------------------
>> --
>>>>> --
>>>>>> -------===//
>>>>>>> //  Integer Operand Promotion
>>>>>>> //===-----------------------------------------------------------
>> --
>>>>> --
>>>>>> -------===//
>>>>>>> +SDValue DAGTypeLegalizer::ExpandExtend(SDNode *N) {
>>>>>> 
>>>>>> This function really needs a comment explaining its purpose and
>>>>> perhaps
>>>>>> a different name. It's not really doing a full expand of the
>>> extend,
>>>>>> but rather doing some housekeeping beforehand to make sure the
>>>>>> expansion that results from the Promote*() call will be
>> reasonable
>>>>> for
>>>>>> the target. In any case, some comments about the motivation and
>>> logic
>>>>>> would be very helpful.
>>>>>> 
>>>>>>> +  EVT VT0 = N->getValueType(0);
>>>>>>> +  EVT VT1 = N->getOperand(0).getValueType();
>>>>>>> +  DebugLoc dl = N->getDebugLoc();
>>>>>>> +  // We only split the node if the operation is not legal,
>>>>> otherwise
>>>>>> we could get
>>>>>>> +  // a legalization loop.
>>>>>>> +  if (TLI.getOperationAction(N->getOpcode(), VT1) ==
>>>>>> TargetLowering::Legal)
>>>>>>> +    return SDValue();
>>>>>>> 
>>>>>>> +  if (!VT0.isVector())
>>>>>>> +    return SDValue();
>>>>>>> +
>>>>>>> +  // We don't want expand if the intermediate type is the
>> v2i16,
>>>>>>> +  // since v2i16 is not a legal type. This only happens when
>> the
>>>>>>> +  // destination has the v2i32 type and the source is v2i8.
>>>>>>> +  if (VT1 == MVT::v2i8 && VT0 == MVT::v2i32)
>>>>>>> +    return SDValue();
>>>>>> 
>>>>>> This is in generic code now, so we can't make assumptions about
>>> what
>>>>>> types are legal. We need to ask the target. This function needs
>> to
>>> be
>>>>>> made more generic, basically. For example, here we assume that
>>> v2i16
>>>>>> isn't legal. That's true for ARM, but quite possibly untrue for
>>> other
>>>>>> targets that support 32-bit vector types. Likewise, we may need
>> to
>>>>>> check for which operations are legal as well as which types.
>>>>>> 
>>>>>>> +  // If there is already an instruction that could be directly
>>>>>>> +  // selected for this there is no reason to expand since it
>> will
>>>>>>> +  // generate a redundant expand instruction (from type X to
>> type
>>>>>> X).
>>>>>>> +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
>>>>>>> +    return SDValue();
>>>>>>> +
>>>>>>> +  LLVMContext *Context = DAG.getContext();
>>>>>>> +
>>>>>>> +  // We need to get the intermediate type. This type will be
>>> twice
>>>>>> as
>>>>>>> +  // small as the destination type, since all expand
>> instructions
>>>>>> are
>>>>>>> +  // from a length X to a length 2*X and we have expand
>>>>> instructions
>>>>>>> +  // for all destination types.
>>>>>>> +  EVT middleType = EVT::getVectorVT(*Context,
>>>>>>> +                          EVT::getIntegerVT(*Context,
>>>>>>> +
>>>>> VT0.getScalarType().getSizeInBits()
>>>>>> / 2),
>>>>>>> +                          VT0.getVectorNumElements());
>>>>>>> +  return DAG.getNode(N->getOpcode(), dl, VT0,
>>>>>>> +                     DAG.getNode(N->getOpcode(), dl,
>>>>>>> +                                 middleType, N-
>>> getOperand(0)));
>>>>>>> +}
>>>>>>> +
>>>>>>> /// PromoteIntegerOperand - This method is called when the
>>>>> specified
>>>>>> operand of
>>>>>>> /// the specified node is found to need promotion.  At this
>> point,
>>>>>> all of the
>>>>>>> /// result types of the node are known to be legal, but other
>>>>>> operands of the
>>>>>>> @@ -752,7 +789,10 @@
>>>>>>>  #endif
>>>>>>>    llvm_unreachable("Do not know how to promote this operator's
>>>>>> operand!");
>>>>>>> 
>>>>>>> -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N);
>>> break;
>>>>>>> +  case ISD::ANY_EXTEND:
>>>>>>> +    Res = ExpandExtend(N);
>>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
>>>>>>> +    break;
>>>>>>>  case ISD::ATOMIC_STORE:
>>>>>>>    Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
>>>>>>>    break;
>>>>>>> @@ -774,15 +814,20 @@
>>>>>>>  case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo);
>>>>> break;
>>>>>>>  case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N, OpNo);
>>>>>> break;
>>>>>>>  case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo);
>>>>> break;
>>>>>>> -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N);
>>>>> break;
>>>>>>> +  case ISD::SIGN_EXTEND:
>>>>>>> +    Res = ExpandExtend(N);
>>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
>>>>>>> +    break;
>>>>>>>  case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N);
>> break;
>>>>>>>  case ISD::STORE:        Res =
>>>>>> PromoteIntOp_STORE(cast<StoreSDNode>(N),
>>>>>>>                                                   OpNo); break;
>>>>>>>  case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N); break;
>>>>>>>  case ISD::FP16_TO_FP32:
>>>>>>>  case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N);
>> break;
>>>>>>> -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N);
>>>>> break;
>>>>>>> -
>>>>>>> +  case ISD::ZERO_EXTEND:
>>>>>>> +    Res = ExpandExtend(N);
>>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
>>>>>>> +    break;
>>>>>>>  case ISD::SHL:
>>>>>>>  case ISD::SRA:
>>>>>>>  case ISD::SRL:
>>>>>>> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
>>>>>>> 
>>> ===================================================================
>>>>>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (revision 164117)
>>>>>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (working copy)
>>>>>>> @@ -2565,7 +2565,8 @@
>>>>>>>            VT.getVectorNumElements() ==
>>>>>>>            Operand.getValueType().getVectorNumElements()) &&
>>>>>>>           "Vector element count mismatch!");
>>>>>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
>>>>>> ISD::ZERO_EXTEND)
>>>>>>> +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
>>>>>> ISD::ZERO_EXTEND) &&
>>>>>>> +       TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>>>>>      return getNode(OpOpcode, DL, VT, Operand.getNode()-
>>>>>>> getOperand(0));
>>>>>>>    else if (OpOpcode == ISD::UNDEF)
>>>>>>>      // sext(undef) = 0, because the top bits will all be the
>>>>> same.
>>>>>>> @@ -2581,7 +2582,9 @@
>>>>>>>            VT.getVectorNumElements() ==
>>>>>>>            Operand.getValueType().getVectorNumElements()) &&
>>>>>>>           "Vector element count mismatch!");
>>>>>>> -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) ->
>>>>> (zext
>>>>>> x)
>>>>>>> +    if (OpOpcode == ISD::ZERO_EXTEND &&
>>>>>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>>>>> +      // (zext (zext x)) -> (zext x)
>>>>>>>      return getNode(ISD::ZERO_EXTEND, DL, VT,
>>>>>>>                     Operand.getNode()->getOperand(0));
>>>>>>>    else if (OpOpcode == ISD::UNDEF)
>>>>>>> @@ -2599,8 +2602,9 @@
>>>>>>>            Operand.getValueType().getVectorNumElements()) &&
>>>>>>>           "Vector element count mismatch!");
>>>>>>> 
>>>>>>> -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
>>>>> ISD::SIGN_EXTEND
>>>>>> ||
>>>>>>> -        OpOpcode == ISD::ANY_EXTEND)
>>>>>>> +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
>>>>>> ISD::SIGN_EXTEND ||
>>>>>>> +        OpOpcode == ISD::ANY_EXTEND) &&
>>>>>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>>>>>      // (ext (zext x)) -> (zext x)  and  (ext (sext x)) ->
>> (sext
>>>>> x)
>>>>>>>      return getNode(OpOpcode, DL, VT, Operand.getNode()-
>>>>>>> getOperand(0));
>>>>>>>    else if (OpOpcode == ISD::UNDEF)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sep 19, 2012, at 3:47 AM, Silviu Baranga <silbar01 at arm.com>
>>> wrote:
>>>>>> 
>>>>>>> Hi Jim,
>>>>>>> 
>>>>>>> Thanks for the review. It does make sense to do this in a target
>>>>>>> independent way. I've attached a new patch that does this.
>>>>>>> 
>>>>>>> I've moved the code ARM-specific code into the legalizer and
>> also
>>>>>>> added tests for sext nodes. It appears that I was mistaken
>> before
>>>>>>> and (sext (sextload)) patterns were not prevented from folding
>> for
>>>>>>> vector types. This was stopping the generation of vaddl-type
>>>>>>> instructions. I think that stopping the folding of (sext
>>>>> (sextload))
>>>>>>> patterns is still OK, because the merging would result in a
>>> illegal
>>>>>>> operation.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Silviu
>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>>>>>>> Sent: 17 September 2012 18:09
>>>>>>>> To: Silviu Baranga
>>>>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
>>>>> vaddl-
>>>>>> type
>>>>>>>> instructions from extloads on ARM
>>>>>>>> 
>>>>>>>> Hi Silviu,
>>>>>>>> 
>>>>>>>> Thanks for the update. This is definitely much better. Why do
>> we
>>>>>> need
>>>>>>>> any ARM specific code here, though? Given the problem
>>> description,
>>>>>> it
>>>>>>>> seems like this is something the legalizer should be able to
>>>>> reason
>>>>>>>> about without needing custom handling in the target.
>>>>>>>> 
>>>>>>>> -Jim
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sep 6, 2012, at 9:34 AM, Silviu Baranga <silbar01 at arm.com>
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Jim,
>>>>>>>>> 
>>>>>>>>> Thanks for the review. The BREAK node solution was a bit ugly.
>>>>>>>>> 
>>>>>>>>> I've attached a new patch that solves this without a BREAK
>> node.
>>>>>>>>> This is done by stopping the folding of nodes zext nodes when
>>>>>>>>> this will result in a non-legal operation.
>>>>>>>>> 
>>>>>>>>> I also had to stop the (zext (zextload x)) folding for vector
>>>>>>>>> types. This is already being done for sext nodes.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Silviu
>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>>>>>>>>> Sent: 06 September 2012 01:42
>>>>>>>>>> To: Silviu Baranga
>>>>>>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
>>>>>> vaddl-
>>>>>>>> type
>>>>>>>>>> instructions from extloads on ARM
>>>>>>>>>> 
>>>>>>>>>> Hi Silviu,
>>>>>>>>>> 
>>>>>>>>>> I like the idea of teaching the compiler to use these
>>>>>> instructions,
>>>>>>>> but
>>>>>>>>>> the BREAK pseudo-instruction feels pretty gross to me. I'd
>> much
>>>>>>>> prefer
>>>>>>>>>> we figure out how to teach the combiner and isel to be
>> smarter
>>>>>>>> directly
>>>>>>>>>> rather than doing something like that.
>>>>>>>>>> 
>>>>>>>>>> -Jim
>>>>>>>>>> 
>>>>>>>>>> On Sep 4, 2012, at 1:58 AM, Silviu Baranga <silbar01 at arm.com>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> Currently LLVM will not generate a vaddl-type instructions
>>> when
>>>>>> an
>>>>>>>>>> extload
>>>>>>>>>>> node is in the input dag. All vector extload nodes are
>>>>>> implemented
>>>>>>>>>> with
>>>>>>>>>>> a load instruction and a lengthening instruction on ARM. If
>>> the
>>>>>>>>>> lengthening
>>>>>>>>>>> instruction is followed by an add instruction, it would be
>>>>> better
>>>>>>>> to
>>>>>>>>>>> generate
>>>>>>>>>>> a vaddl-type instruction instead.
>>>>>>>>>>> 
>>>>>>>>>>> To do this, we split all lengthening dag nodes which extend
>> a
>>>>>> type
>>>>>>>> to
>>>>>>>>>> more
>>>>>>>>>>> then twice its size (this is OK because these nodes would be
>>>>>>>>>> implemented by
>>>>>>>>>>> multiple instructions anyway) into two lengthening nodes.
>> One
>>>>>> node
>>>>>>>>>> will be
>>>>>>>>>>> later
>>>>>>>>>>> combined to form the extload node, and the other can be
>>>>> selected
>>>>>> as
>>>>>>>>>> part of
>>>>>>>>>>> a
>>>>>>>>>>> vaddl-type instruction (there are already patterns for
>> this).
>>>>>>>>>>> 
>>>>>>>>>>> To stop the dag nodes from re-combining, we insert a BREAK
>>> node
>>>>>>>>>> between
>>>>>>>>>>> them. The
>>>>>>>>>>> BREAK node is used as an optimization barrier for the DAG
>>>>>> combiner,
>>>>>>>>>> and
>>>>>>>>>>> might be
>>>>>>>>>>> useful in other cases as well. We remove the BREAK node in
>> the
>>>>>>>>>> instruction
>>>>>>>>>>> selection stage.
>>>>>>>>>>> 
>>>>>>>>>>> This approach allows the reuse of existing selection
>> patterns.
>>>>>>>>>>> 
>>>>>>>>>>> This patch depends on the following other patch:
>>>>>>>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-
>>>>>>>>>> 20120827/149201.
>>>>>>>>>>> html
>>>>>>>>>>> 
>>>>>>>>>>> Please review!
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>> Silviu<patch.diff>_______________________________________________
>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> <zext_split.diff>
>>>>>>>> 
>>>>>>>> 
>>>>>>> <zext_split.diff>
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
> 
> 
> 
>