[llvm-commits] [patch] Enable the generation of vaddl-type instructions from extloads on ARM

Thu Dec 13 20:24:20 PST 2012

On Dec 13, 2012, at 3:18 AM, Silviu Baranga <silviu.baranga at arm.com> wrote:

> Hi Evan,
> 
> I've attached a new patch. This should fix the typo and make the comments
> more clear. I've also renamed the LowerExtendOp function to LowerVectorExtendOp.
> 
>> I am confused by the comments. Can you give an concrete example what
>> transformation it would do?
> 
> I've modified the LowerVectorExtendOp comment to include an example.

+// LowerExtendOp - This function expands vector zext/sext nodes so that the

It should be LowerVectorExtendOp.

+// result will more closely resemble the operations available on the target.
+// For example if zext is legal for v4i32 and not legal for v2i8, this will

I think you mean v2i32, not v4i32?
+// split a (v2i64 zext (v2i8 x)) node into (v2i64 zext (v2i32 zext (v2i8 x)))

> 
>> +  return DAG.getNode(N->getOpcode(), dl, VT0,
>> +                     DAG.getNode(N->getOpcode(), dl,
>> +                                 MiddleType, N->getOperand(0)));
>> 
>> Is it possible the returned node would be lowered by this routine
>> again?
> 
> Let's suppose that LowerVectorExtendOp will try to re-lower that node.
> Because of the way we select the MiddleType, the size of the input type
> will be half the size of the output type.
> Therefore, the following code will cause an early return in this case:
> 
> +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> +    return SDValue();

This approach feels wrong to me. It seems arbitrary to extend it to a vector type with elements that are half the width of the result vector type. It seems like it should rely on a target hook which provides the legal vector type of the specified number of elements.

Overall the approach seems wrong to me. Is the idea to promote all of the source of vector extend to the same type to reduce the number of patterns? Targets already can specify operation action for certain opcodes to "Promote" and specify a promotion type. e.g. For x86:

    // Promote v32i8, v16i16, v8i32 select, and, or, xor to v4i64.                                                                                                                                                                         
    for (int i = MVT::v32i8; i != MVT::v4i64; ++i) {
      MVT VT = (MVT::SimpleValueType)i;

      // Do not attempt to promote non-256-bit vectors                                                                                                                                                                                    
      if (!VT.is256BitVector())
        continue;

      setOperationAction(ISD::AND,    VT, Promote);
      AddPromotedToType (ISD::AND,    VT, MVT::v4i64);
...

It seems like you want to do something similar for extend except it's the source that should be promoted. I'm not really certain what's the right approach. Nadav and Owen do you guys have suggestions?

Evan

> 
> So the node should not be re-lowered.
> 
> --Silviu
> 
> 
>> -----Original Message-----
>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>> Sent: 12 December 2012 23:14
>> To: Silviu Baranga
>> Cc: Jim Grosbach; llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-type
>> instructions from extloads on ARM
>> 
>> Sorry about the late reply. It got buried in the inbox.
>> 
>> 
>> +// LowerExtendOp - This function breaks up vector zext/sext nodes so
>> that the
>> +// result will more closely resemble the operations available on the
>> target.
>> +// The resulting nodes can be each combined with other nodes to create
>> legal
>> +// operations.
>> +SDValue DAGTypeLegalizer::LowerExtendOp(SDNode *N) {
>> +  EVT VT0 = N->getValueType(0);
>> +  EVT VT1 = N->getOperand(0).getValueType();
>> +  DebugLoc dl = N->getDebugLoc();
>> +
>> +  if (!VT0.isVector())
>> +    return SDValue();
>> 
>> Is it true this routine only handles vector nodes? If so, the name and
>> the comment should both make the fact clear.
>> 
>> +// LowerExtendOp - This function breaks up vector zext/sext nodes so
>> that the
>> +// result will more closely resemble the operations available on the
>> target.
>> +// The resulting nodes can be each combined with other nodes to create
>> legal
>> +// operations.
>> +SDValue DAGTypeLegalizer::LowerExtendOp(SDNode *N) {
>> 
>> I am confused by the comments. Can you give an concrete example what
>> transformation it would do?
>> 
>> 
>> +  // We don't want to expand if we can't use a legal intermediate type
>> +  // since we have no reason to belevie that this would be beneficial.
>> Typo: believe
>> 
>> 
>> +  return DAG.getNode(N->getOpcode(), dl, VT0,
>> +                     DAG.getNode(N->getOpcode(), dl,
>> +                                 MiddleType, N->getOperand(0)));
>> 
>> Is it possible the returned node would be lowered by this routine
>> again?
>> 
>> 
>> Evan
>> 
>> On Dec 7, 2012, at 8:41 AM, Silviu Baranga <silviu.baranga at arm.com>
>> wrote:
>> 
>>> Hi Evan,
>>> 
>>> Sure. The patch is attached to this email.
>>> 
>>> Thanks,
>>> Silviu
>>> 
>>>> -----Original Message-----
>>>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>>>> Sent: 07 December 2012 16:39
>>>> To: Silviu Baranga
>>>> Cc: Jim Grosbach; llvm-commits at cs.uiuc.edu
>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-
>> type
>>>> instructions from extloads on ARM
>>>> 
>>>> Can you attach the patch?
>>>> 
>>>> Thanks,
>>>> 
>>>> Evan
>>>> 
>>>> Sent from my iPad
>>>> 
>>>> On Dec 7, 2012, at 4:33 AM, Silviu Baranga <silbar01 at arm.com> wrote:
>>>> 
>>>>> Ping!
>>>>> 
>>>>> --Silviu
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Silviu Baranga [mailto:silbar01 at arm.com]
>>>>>> Sent: 26 November 2012 16:38
>>>>>> To: 'Jim Grosbach'
>>>>>> Cc: llvm-commits at cs.uiuc.edu for LLVM; Evan Cheng
>>>>>> Subject: RE: [llvm-commits] [patch] Enable the generation of
>> vaddl-
>>>> type
>>>>>> instructions from extloads on ARM
>>>>>> 
>>>>>> Hi Jim,
>>>>>> 
>>>>>> I finally got around to re-writing this patch. Sorry for taking
>>>>>> so long.
>>>>>> 
>>>>>> I've refactored this patch to address your previous comments.
>>>>>> The code now uses "isOperationLegalOrCustom" to decide if it
>>>>>> can introduce the new zext/sext node for the intermediary type.
>>>>>> 
>>>>>> On the ARM side I've set the zext action for 64 bit vectors to
>>>> legal,
>>>>>> and everything else to expand. All extload-type vector actions
>>>>>> are also set to expand now, since there are no instructions that
>>>>>> implement this.
>>>>>> 
>>>>>> As for the "LowerExtendOp" name, I'm open to suggestions.
>>>>>> 
>>>>>> This also fixes PR11570.
>>>>>> 
>>>>>> Cheers,
>>>>>> Silviu
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>>>>>> Sent: 02 October 2012 21:54
>>>>>>> To: Silviu Baranga
>>>>>>> Cc: llvm-commits at cs.uiuc.edu for LLVM; Evan Cheng
>>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
>> vaddl-
>>>>>> type
>>>>>>> instructions from extloads on ARM
>>>>>>> 
>>>>>>> Hi Silviu,
>>>>>>> 
>>>>>>> Specific comments below.
>>>>>>> 
>>>>>>> Evan, you're more expert than I in this part of the code. There's
>>>>>>> something about this whole thing that feels a bit funny, but I
>>>> can't
>>>>>>> put my finger on it effectively enough to be very helpful. Mind
>>>>>> having
>>>>>>> a look? It'll probably be obvious one way or the other to you.
>>>>>>> 
>>>>>>>> Index: test/CodeGen/ARM/vector-extend-narrow.ll
>>>>>>> 
>>>>>>> General comment: thank you for making sure to include test cases.
>>>>>> It's
>>>>>>> amazing how many patches don't do that...
>>>>>>> 
>>>>>>>> 
>>>> ===================================================================
>>>>>>>> --- test/CodeGen/ARM/vector-extend-narrow.ll    (revision
>> 164117)
>>>>>>>> +++ test/CodeGen/ARM/vector-extend-narrow.ll    (working copy)
>>>>>>>> @@ -73,3 +73,68 @@
>>>>>>>> ret <4 x i32> %2
>>>>>>>> }
>>>>>>>> 
>>>>>>>> +; CHECK: k:
>>>>>>>> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
>>>>>>>> +  ; CHECK: vld1.32
>>>>>>>> +  ; CHECK: vmovl.u8
>>>>>>>> +  ; CHECK: vaddw.u16
>>>>>>>> +  %1 = load <4 x i8>* %in, align 4
>>>>>>>> +  %2 = zext <4 x i8> %1 to <4 x i32>
>>>>>>>> +  %3 = add <4 x i32> %2, %x
>>>>>>>> +  ret <4 x i32> %3
>>>>>>>> +}
>>>>>>>> +; CHECK: l:
>>>>>>>> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
>>>>>>>> +  ; CHECK: vld1.16
>>>>>>>> +  ; CHECK: vmovl.u8
>>>>>>>> +  ; CHECK: vmovl.u16
>>>>>>>> +  ; CHECK: vaddw.u32
>>>>>>>> +  %1 = load <2 x i8>* %in, align 4
>>>>>>>> +  %2 = zext <2 x i8> %1 to <2 x i64>
>>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>>> +  ret <2 x i64> %3
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +; CHECK: m:
>>>>>>>> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
>>>>>>>> +  ; CHECK: vld1.32
>>>>>>>> +  ; CHECK: vmovl.u16
>>>>>>>> +  ; CHECK: vaddw.u32
>>>>>>>> +  %1 = load <2 x i16>* %in, align 4
>>>>>>>> +  %2 = zext <2 x i16> %1 to <2 x i64>
>>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>>> +  ret <2 x i64> %3
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +; CHECK: n:
>>>>>>>> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
>>>>>>>> +  ; CHECK: vld1.32
>>>>>>>> +  ; CHECK: vmovl.s8
>>>>>>>> +  ; CHECK: vaddw.s16
>>>>>>>> +  %1 = load <4 x i8>* %in, align 4
>>>>>>>> +  %2 = sext <4 x i8> %1 to <4 x i32>
>>>>>>>> +  %3 = add <4 x i32> %2, %x
>>>>>>>> +  ret <4 x i32> %3
>>>>>>>> +}
>>>>>>>> +; CHECK: o:
>>>>>>>> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
>>>>>>>> +  ; CHECK: vld1.16
>>>>>>>> +  ; CHECK: vmovl.s8
>>>>>>>> +  ; CHECK: vmovl.s16
>>>>>>>> +  ; CHECK: vaddw.s32
>>>>>>>> +  %1 = load <2 x i8>* %in, align 4
>>>>>>>> +  %2 = sext <2 x i8> %1 to <2 x i64>
>>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>>> +  ret <2 x i64> %3
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +; CHECK: p:
>>>>>>>> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
>>>>>>>> +  ; CHECK: vld1.32
>>>>>>>> +  ; CHECK: vmovl.s16
>>>>>>>> +  ; CHECK: vaddw.s32
>>>>>>>> +  %1 = load <2 x i16>* %in, align 4
>>>>>>>> +  %2 = sext <2 x i16> %1 to <2 x i64>
>>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>>> +  ret <2 x i64> %3
>>>>>>>> +}
>>>>>>>> Index: lib/Target/ARM/ARMISelLowering.h
>>>>>>>> 
>>>> ===================================================================
>>>>>>>> --- lib/Target/ARM/ARMISelLowering.h    (revision 164117)
>>>>>>>> +++ lib/Target/ARM/ARMISelLowering.h    (working copy)
>>>>>>>> @@ -403,6 +406,8 @@
>>>>>>>>   void addDRTypeForNEON(MVT VT);
>>>>>>>>   void addQRTypeForNEON(MVT VT);
>>>>>>>> 
>>>>>>>> +    void addExtendLegalizeTypeForNEON(MVT VT);
>>>>>>>> +
>>>>>>>>   typedef SmallVector<std::pair<unsigned, SDValue>, 8>
>>>>>>> RegsToPassVector;
>>>>>>>>   void PassF64ArgInRegs(DebugLoc dl, SelectionDAG &DAG,
>>>>>>>>                         SDValue Chain, SDValue &Arg,
>>>>>>>> Index: lib/Target/ARM/ARMISelLowering.cpp
>>>>>>>> 
>>>> ===================================================================
>>>>>>>> --- lib/Target/ARM/ARMISelLowering.cpp    (revision 164117)
>>>>>>>> +++ lib/Target/ARM/ARMISelLowering.cpp    (working copy)
>>>>>>>> @@ -158,6 +158,15 @@
>>>>>>>> addTypeForNEON(VT, MVT::v2f64, MVT::v4i32);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> +void ARMTargetLowering::addExtendLegalizeTypeForNEON(MVT VT) {
>>>>>>>> +      setLoadExtAction(ISD::EXTLOAD, VT, Legal);
>>>>>>>> +      setLoadExtAction(ISD::SEXTLOAD, VT, Legal);
>>>>>>>> +      setLoadExtAction(ISD::ZEXTLOAD, VT, Legal);
>>>>>>>> +      setOperationAction(ISD::ANY_EXTEND, VT, Expand);
>>>>>>>> +      setOperationAction(ISD::SIGN_EXTEND, VT, Expand);
>>>>>>>> +      setOperationAction(ISD::ZERO_EXTEND, VT, Expand);
>>>>>>>> +}
>>>>>>> 
>>>>>>> Formatting. Only two space indentation required.
>>>>>>> 
>>>>>>>> +
>>>>>>>> static TargetLoweringObjectFile *createTLOF(TargetMachine &TM) {
>>>>>>>> if (TM.getSubtarget<ARMSubtarget>().isTargetDarwin())
>>>>>>>>   return new TargetLoweringObjectFileMachO();
>>>>>>>> @@ -556,15 +565,17 @@
>>>>>>>>   setTargetDAGCombine(ISD::FP_TO_UINT);
>>>>>>>>   setTargetDAGCombine(ISD::FDIV);
>>>>>>>> 
>>>>>>>> -    // It is legal to extload from v4i8 to v4i16 or v4i32.
>>>>>>>> -    MVT Tys[6] = {MVT::v8i8, MVT::v4i8, MVT::v2i8,
>>>>>>>> -                  MVT::v4i16, MVT::v2i16,
>>>>>>>> -                  MVT::v2i32};
>>>>>>>> -    for (unsigned i = 0; i < 6; ++i) {
>>>>>>>> -      setLoadExtAction(ISD::EXTLOAD, Tys[i], Legal);
>>>>>>>> -      setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
>>>>>>>> -      setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
>>>>>>>> -    }
>>>>>>>> +    // There is no instruction to expand v4i8 to v4i32 and
>>>>>>>> +    // we must do this with two vmovl instructions. However,
>>>>>>>> +    // the vmovl instructions can be combined with other
>>>>>>>> +    // lengthening instructions, so we would prefer to lower
>>>>>>>> +    // (zextload x) -> (zext (zextload x).
>>>>>>>> +    addExtendLegalizeTypeForNEON(MVT::v8i8);
>>>>>>>> +    addExtendLegalizeTypeForNEON(MVT::v4i8);
>>>>>>>> +    addExtendLegalizeTypeForNEON(MVT::v2i8);
>>>>>>>> +    addExtendLegalizeTypeForNEON(MVT::v4i16);
>>>>>>>> +    addExtendLegalizeTypeForNEON(MVT::v2i16);
>>>>>>>> +    addExtendLegalizeTypeForNEON(MVT::v2i32);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> // ARM and Thumb2 support UMLAL/SMLAL.
>>>>>>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>>>>>> 
>>>> ===================================================================
>>>>>>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (revision
>> 164117)
>>>>>>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (working copy)
>>>>>>>> @@ -4342,7 +4342,7 @@
>>>>>>>> // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
>>>>>>>> // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
>>>>>>>> if ((ISD::isSEXTLoad(N0.getNode()) ||
>>>>>>> ISD::isEXTLoad(N0.getNode())) &&
>>>>>>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
>>>>>>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
>>>>>>> !VT.isVector()) {
>>>>>>>>   LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>>>>>>>>   EVT MemVT = LN0->getMemoryVT();
>>>>>>>>   if ((!LegalOperations && !LN0->isVolatile()) ||
>>>>>>>> @@ -4678,7 +4678,7 @@
>>>>>>>> // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
>>>>>>>> // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
>>>>>>>> if ((ISD::isZEXTLoad(N0.getNode()) ||
>>>>>>> ISD::isEXTLoad(N0.getNode())) &&
>>>>>>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
>>>>>>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
>>>>>>> !VT.isVector()) {
>>>>>>>>   LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>>>>>>>>   EVT MemVT = LN0->getMemoryVT();
>>>>>>>>   if ((!LegalOperations && !LN0->isVolatile()) ||
>>>>>>>> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
>>>>>>>> 
>>>> ===================================================================
>>>>>>>> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h    (revision
>> 164117)
>>>>>>>> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h    (working copy)
>>>>>>>> @@ -287,6 +287,8 @@
>>>>>>>> SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
>>>>>>>> SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
>>>>>>>> 
>>>>>>>> +  SDValue LowerExtendOp(SDNode *N);
>>>>>>>> +
>>>>>>>> void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
>>>>>> ISD::CondCode
>>>>>>> Code);
>>>>>>>> 
>>>>>>>> //===----------------------------------------------------------
>> -
>>>>>> --
>>>>>>> -------===//
>>>>>>>> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>>>>>>> 
>>>> ===================================================================
>>>>>>>> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>> (revision
>>>>>>> 164117)
>>>>>>>> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>> (working
>>>>>>> copy)
>>>>>>>> @@ -733,6 +733,51 @@
>>>>>>>> //  Integer Operand Promotion
>>>>>>>> //===-----------------------------------------------------------
>> --
>>>>>> --
>>>>>>> -------===//
>>>>>>>> 
>>>>>>>> +// LowerExtendOp - This function breaks up vector zext/sext
>> nodes
>>>>>> so
>>>>>>> that the
>>>>>>>> +// result will more closely resemble the operations available
>> on
>>>>>> the
>>>>>>> target.
>>>>>>>> +// The resulting nodes can be each combined with other nodes to
>>>>>>> create legal
>>>>>>>> +// operations.
>>>>>>> 
>>>>>>> Some addition questions below, but in general, this could
>> probably
>>>>>> use
>>>>>>> a bit more description in the comment about what's actually
>>>> happening
>>>>>>> here. Why this transform is a good idea and exactly what the
>>>>>> transform
>>>>>>> is, in particular.
>>>>>>> 
>>>>>>>> +SDValue DAGTypeLegalizer::LowerExtendOp(SDNode *N) {
>>>>>>> 
>>>>>>> Is "Lower" really the right name for this function?
>>>>>>> 
>>>>>>>> +  EVT VT0 = N->getValueType(0);
>>>>>>>> +  EVT VT1 = N->getOperand(0).getValueType();
>>>>>>>> +  DebugLoc dl = N->getDebugLoc();
>>>>>>>> +  // We only split the node if the operation is not legal,
>>>>>> otherwise
>>>>>>> we could get
>>>>>>>> +  // a legalization loop.
>>>>>>>> +  if (TLI.getOperationAction(N->getOpcode(), VT1) ==
>>>>>>> TargetLowering::Legal)
>>>>>>> 
>>>>>>> You probably just want TLI.isOperationLegalOrCustom() here.
>>>>>>> 
>>>>>>>> +    return SDValue();
>>>>>>>> +
>>>>>>>> +  if (!VT0.isVector())
>>>>>>>> +    return SDValue();
>>>>>>> 
>>>>>>> Minor note, but I'd put this check first as it's cheaper and more
>>>>>>> likely to fire than the check for the operation being legal.
>>>>>>> 
>>>>>>>> +
>>>>>>>> +  // If there is already an instruction that could be directly
>>>>>>>> +  // selected for this there is no reason to expand since it
>> will
>>>>>>>> +  // generate a redundant expand instruction (from type X to
>> type
>>>>>>> X).
>>>>>>>> +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
>>>>>>>> +    return SDValue();
>>>>>>> 
>>>>>>> I don't follow this. It seems pretty arbitrary. The comment
>> doesn't
>>>>>>> help much, since how do we know what instructions could be
>> selected
>>>>>> or
>>>>>>> not? Can you clarify?
>>>>>>> 
>>>>>>>> +
>>>>>>>> +  LLVMContext *Context = DAG.getContext();
>>>>>>> 
>>>>>>> Can just have this be a reference since that's how it's used
>> below
>>>>>> and
>>>>>>> we never can have a NULL context here.
>>>>>>> 
>>>>>>>> +
>>>>>>>> +  // We need to get the intermediate type. This type will be
>>>> twice
>>>>>>> as
>>>>>>>> +  // small as the destination type,
>>>>>>> 
>>>>>>> s/twice as small as/half the size of/
>>>>>>> 
>>>>>>>> since all expand instructions are
>>>>>>>> +  // from a length X to a length 2*X and we have expand
>>>>>> instructions
>>>>>>>> +  // for all destination types.
>>>>>>> 
>>>>>>> Not sure I understand what you mean by "we have expand
>> instructions
>>>>>> for
>>>>>>> all destination types." Can you clarify?
>>>>>>> 
>>>>>>>> +  EVT middleType = EVT::getVectorVT(*Context,
>>>>>>> 
>>>>>>> Variable names should start with a Capitol letter. It's just
>>>> function
>>>>>>> names that are camel-case starting w/ lower case. That said, it's
>>>>>>> unclear what "middle" means here. Perhaps a clearer name or a
>>>>>> comment?
>>>>>>> 
>>>>>>>> +                          EVT::getIntegerVT(*Context,
>>>>>>>> +
>>>>>> VT0.getScalarType().getSizeInBits()
>>>>>>> / 2),
>>>>>>>> +                          VT0.getVectorNumElements());
>>>>>>>> +
>>>>>>>> +  // We don't want to expand if we can't use a legal
>> intermediate
>>>>>>> type
>>>>>>>> +  // since there wouldn't be any benefit from this and we might
>>>>>> end
>>>>>>>> +  // up with nodes that cannot be selected in the instruction
>>>>>>> selection
>>>>>>>> +  // stage.
>>>>>>>> +  if (!TLI.isTypeLegal(middleType))
>>>>>>>> +    return SDValue();
>>>>>>>> +
>>>>>>>> +  return DAG.getNode(N->getOpcode(), dl, VT0,
>>>>>>>> +                     DAG.getNode(N->getOpcode(), dl,
>>>>>>>> +                                 middleType, N-
>>> getOperand(0)));
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> /// PromoteIntegerOperand - This method is called when the
>>>>>> specified
>>>>>>> operand of
>>>>>>>> /// the specified node is found to need promotion.  At this
>> point,
>>>>>>> all of the
>>>>>>>> /// result types of the node are known to be legal, but other
>>>>>>> operands of the
>>>>>>>> @@ -752,7 +797,10 @@
>>>>>>>> #endif
>>>>>>>>   llvm_unreachable("Do not know how to promote this operator's
>>>>>>> operand!");
>>>>>>>> 
>>>>>>>> -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N);
>>>> break;
>>>>>>>> +  case ISD::ANY_EXTEND:
>>>>>>>> +    Res = LowerExtendOp(N);
>>>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
>>>>>>>> +    break;
>>>>>>>> case ISD::ATOMIC_STORE:
>>>>>>>>   Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
>>>>>>>>   break;
>>>>>>>> @@ -774,15 +822,20 @@
>>>>>>>> case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo);
>>>>>> break;
>>>>>>>> case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N, OpNo);
>>>>>>> break;
>>>>>>>> case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo);
>>>>>> break;
>>>>>>>> -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N);
>>>>>> break;
>>>>>>>> +  case ISD::SIGN_EXTEND:
>>>>>>>> +    Res = LowerExtendOp(N);
>>>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
>>>>>>>> +    break;
>>>>>>>> case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N);
>> break;
>>>>>>>> case ISD::STORE:        Res =
>>>>>>> PromoteIntOp_STORE(cast<StoreSDNode>(N),
>>>>>>>>                                                  OpNo); break;
>>>>>>>> case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N); break;
>>>>>>>> case ISD::FP16_TO_FP32:
>>>>>>>> case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N);
>> break;
>>>>>>>> -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N);
>>>>>> break;
>>>>>>>> -
>>>>>>>> +  case ISD::ZERO_EXTEND:
>>>>>>>> +    Res = LowerExtendOp(N);
>>>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
>>>>>>>> +    break;
>>>>>>>> case ISD::SHL:
>>>>>>>> case ISD::SRA:
>>>>>>>> case ISD::SRL:
>>>>>>>> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
>>>>>>>> 
>>>> ===================================================================
>>>>>>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (revision
>> 164117)
>>>>>>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (working copy)
>>>>>>>> @@ -2565,7 +2565,8 @@
>>>>>>>>           VT.getVectorNumElements() ==
>>>>>>>>           Operand.getValueType().getVectorNumElements()) &&
>>>>>>>>          "Vector element count mismatch!");
>>>>>>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
>>>>>>> ISD::ZERO_EXTEND)
>>>>>>>> +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
>>>>>>> ISD::ZERO_EXTEND) &&
>>>>>>>> +       TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>>>>>>     return getNode(OpOpcode, DL, VT, Operand.getNode()-
>>>>>>>> getOperand(0));
>>>>>>>>   else if (OpOpcode == ISD::UNDEF)
>>>>>>>>     // sext(undef) = 0, because the top bits will all be the
>>>>>> same.
>>>>>>>> @@ -2581,7 +2582,9 @@
>>>>>>>>           VT.getVectorNumElements() ==
>>>>>>>>           Operand.getValueType().getVectorNumElements()) &&
>>>>>>>>          "Vector element count mismatch!");
>>>>>>>> -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) ->
>>>>>> (zext
>>>>>>> x)
>>>>>>>> +    if (OpOpcode == ISD::ZERO_EXTEND &&
>>>>>>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>>>>>> +      // (zext (zext x)) -> (zext x)
>>>>>>>>     return getNode(ISD::ZERO_EXTEND, DL, VT,
>>>>>>>>                    Operand.getNode()->getOperand(0));
>>>>>>>>   else if (OpOpcode == ISD::UNDEF)
>>>>>>>> @@ -2599,8 +2602,9 @@
>>>>>>>>           Operand.getValueType().getVectorNumElements()) &&
>>>>>>>>          "Vector element count mismatch!");
>>>>>>>> 
>>>>>>>> -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
>>>>>> ISD::SIGN_EXTEND
>>>>>>> ||
>>>>>>>> -        OpOpcode == ISD::ANY_EXTEND)
>>>>>>>> +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
>>>>>>> ISD::SIGN_EXTEND ||
>>>>>>>> +        OpOpcode == ISD::ANY_EXTEND) &&
>>>>>>>> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>>>>>>     // (ext (zext x)) -> (zext x)  and  (ext (sext x)) -> (sext
>>>>>> x)
>>>>>>>>     return getNode(OpOpcode, DL, VT, Operand.getNode()-
>>>>>>>> getOperand(0));
>>>>>>>>   else if (OpOpcode == ISD::UNDEF)
>>>>>>> 
>>>>>>> On Oct 1, 2012, at 8:09 AM, Silviu Baranga <silbar01 at arm.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Ping.
>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Silviu Baranga [mailto:silbar01 at arm.com]
>>>>>>>>> Sent: 20 September 2012 15:48
>>>>>>>>> To: 'Jim Grosbach'
>>>>>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>>>>>> Subject: RE: [llvm-commits] [patch] Enable the generation of
>>>>>> vaddl-
>>>>>>> type
>>>>>>>>> instructions from extloads on ARM
>>>>>>>>> 
>>>>>>>>> Hi Jim,
>>>>>>>>> 
>>>>>>>>> Thanks for the comments! I am attaching a new patch with the
>>>>>>>>> changes applied.
>>>>>>>>> 
>>>>>>>>> To make the code target-independent I've added the check that
>>>>>>>>> the 'middleType' is legal, which is equivalent to the previous
>>>>>>>>> ARM specific check. I didn't check if the operation is legal,
>>>>>>>>> since if the code got there it is marked as 'Expand' anyway.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Silviu
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>>>>>>>>> Sent: 19 September 2012 17:56
>>>>>>>>>> To: Silviu Baranga
>>>>>>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
>>>>>> vaddl-
>>>>>>>>> type
>>>>>>>>>> instructions from extloads on ARM
>>>>>>>>>> 
>>>>>>>>>> Hi Silviu,
>>>>>>>>>> 
>>>>>>>>>> Making good progress! A few comments on the updated patch
>>>> inline.
>>>>>>>>>> 
>>>>>>>>>>> Index: test/CodeGen/ARM/vector-extend-narrow.ll
>>>>>>>>>>> 
>>>>>>> 
>> ===================================================================
>>>>>>>>>>> --- test/CodeGen/ARM/vector-extend-narrow.ll    (revision
>>>> 164117)
>>>>>>>>>>> +++ test/CodeGen/ARM/vector-extend-narrow.ll    (working
>> copy)
>>>>>>>>>>> @@ -73,3 +73,68 @@
>>>>>>>>>>> ret <4 x i32> %2
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> +; CHECK: k:
>>>>>>>>>>> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
>>>>>>>>>>> +  ; CHECK: vld1.32
>>>>>>>>>>> +  ; CHECK: vmovl.u8
>>>>>>>>>>> +  ; CHECK: vaddw.u16
>>>>>>>>>>> +  %1 = load <4 x i8>* %in, align 4
>>>>>>>>>>> +  %2 = zext <4 x i8> %1 to <4 x i32>
>>>>>>>>>>> +  %3 = add <4 x i32> %2, %x
>>>>>>>>>>> +  ret <4 x i32> %3
>>>>>>>>>>> +}
>>>>>>>>>>> +; CHECK: l:
>>>>>>>>>>> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
>>>>>>>>>>> +  ; CHECK: vld1.16
>>>>>>>>>>> +  ; CHECK: vmovl.u8
>>>>>>>>>>> +  ; CHECK: vmovl.u16
>>>>>>>>>>> +  ; CHECK: vaddw.u32
>>>>>>>>>>> +  %1 = load <2 x i8>* %in, align 4
>>>>>>>>>>> +  %2 = zext <2 x i8> %1 to <2 x i64>
>>>>>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>>>>>> +  ret <2 x i64> %3
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +; CHECK: m:
>>>>>>>>>>> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
>>>>>>>>>>> +  ; CHECK: vld1.32
>>>>>>>>>>> +  ; CHECK: vmovl.u16
>>>>>>>>>>> +  ; CHECK: vaddw.u32
>>>>>>>>>>> +  %1 = load <2 x i16>* %in, align 4
>>>>>>>>>>> +  %2 = zext <2 x i16> %1 to <2 x i64>
>>>>>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>>>>>> +  ret <2 x i64> %3
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +; CHECK: n:
>>>>>>>>>>> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
>>>>>>>>>>> +  ; CHECK: vld1.32
>>>>>>>>>>> +  ; CHECK: vmovl.s8
>>>>>>>>>>> +  ; CHECK: vaddw.s16
>>>>>>>>>>> +  %1 = load <4 x i8>* %in, align 4
>>>>>>>>>>> +  %2 = sext <4 x i8> %1 to <4 x i32>
>>>>>>>>>>> +  %3 = add <4 x i32> %2, %x
>>>>>>>>>>> +  ret <4 x i32> %3
>>>>>>>>>>> +}
>>>>>>>>>>> +; CHECK: o:
>>>>>>>>>>> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
>>>>>>>>>>> +  ; CHECK: vld1.16
>>>>>>>>>>> +  ; CHECK: vmovl.s8
>>>>>>>>>>> +  ; CHECK: vmovl.s16
>>>>>>>>>>> +  ; CHECK: vaddw.s32
>>>>>>>>>>> +  %1 = load <2 x i8>* %in, align 4
>>>>>>>>>>> +  %2 = sext <2 x i8> %1 to <2 x i64>
>>>>>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>>>>>> +  ret <2 x i64> %3
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +; CHECK: p:
>>>>>>>>>>> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
>>>>>>>>>>> +  ; CHECK: vld1.32
>>>>>>>>>>> +  ; CHECK: vmovl.s16
>>>>>>>>>>> +  ; CHECK: vaddw.s32
>>>>>>>>>>> +  %1 = load <2 x i16>* %in, align 4
>>>>>>>>>>> +  %2 = sext <2 x i16> %1 to <2 x i64>
>>>>>>>>>>> +  %3 = add <2 x i64> %2, %x
>>>>>>>>>>> +  ret <2 x i64> %3
>>>>>>>>>>> +}
>>>>>>>>>>> Index: lib/Target/ARM/ARMISelLowering.cpp
>>>>>>>>>>> 
>>>>>>> 
>> ===================================================================
>>>>>>>>>>> --- lib/Target/ARM/ARMISelLowering.cpp    (revision 164117)
>>>>>>>>>>> +++ lib/Target/ARM/ARMISelLowering.cpp    (working copy)
>>>>>>>>>>> @@ -565,6 +570,12 @@
>>>>>>>>>>>    setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
>>>>>>>>>>>    setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
>>>>>>>>>>>  }
>>>>>>>>>>> +
>>>>>>>>>>> +    for (unsigned i = 0; i < 6; ++i) {
>>>>>>>>>>> +      setOperationAction(ISD::ZERO_EXTEND, Tys[i], Expand);
>>>>>>>>>>> +      setOperationAction(ISD::SIGN_EXTEND, Tys[i], Expand);
>>>>>>>>>>> +      setOperationAction(ISD::ANY_EXTEND, Tys[i], Expand);
>>>>>>>>>>> +    }
>>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> I'd have sworn I commented about this bit before, but I can't
>>>>>> find
>>>>>>> it
>>>>>>>>>> in my email logs.
>>>>>>>>>> 
>>>>>>>>>> Anyways, This, and the loop above, should be moved out to a
>>>>>> helper
>>>>>>>>>> function and the loop removed. That is, the helper should
>>>> perform
>>>>>>> the
>>>>>>>>>> setOperationAction() calls for a single type, then the top
>> level
>>>>>>>>>> function just calls that helper 6 times, once for each MVT.
>> The
>>>>>>> NEON
>>>>>>>>>> bits for setting legal types do something similar.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> // ARM and Thumb2 support UMLAL/SMLAL.
>>>>>>>>>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>>>>>>>>> 
>>>>>>> 
>> ===================================================================
>>>>>>>>>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (revision
>>>> 164117)
>>>>>>>>>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp    (working
>> copy)
>>>>>>>>>>> @@ -4342,7 +4342,7 @@
>>>>>>>>>>> // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
>>>>>>>>>>> // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
>>>>>>>>>>> if ((ISD::isSEXTLoad(N0.getNode()) ||
>>>>>>>>>> ISD::isEXTLoad(N0.getNode())) &&
>>>>>>>>>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse())
>> {
>>>>>>>>>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()
>> &&
>>>>>>>>>> !VT.isVector()) {
>>>>>>>>>>>  LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>>>>>>>>>>>  EVT MemVT = LN0->getMemoryVT();
>>>>>>>>>>>  if ((!LegalOperations && !LN0->isVolatile()) ||
>>>>>>>>>>> @@ -4678,7 +4678,7 @@
>>>>>>>>>>> // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
>>>>>>>>>>> // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
>>>>>>>>>>> if ((ISD::isZEXTLoad(N0.getNode()) ||
>>>>>>>>>> ISD::isEXTLoad(N0.getNode())) &&
>>>>>>>>>>> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse())
>> {
>>>>>>>>>>> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()
>> &&
>>>>>>>>>> !VT.isVector()) {
>>>>>>>>>>>  LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>>>>>>>>>>>  EVT MemVT = LN0->getMemoryVT();
>>>>>>>>>>>  if ((!LegalOperations && !LN0->isVolatile()) ||
>>>>>>>>>>> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
>>>>>>>>>>> 
>>>>>>> 
>> ===================================================================
>>>>>>>>>>> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h    (revision
>>>> 164117)
>>>>>>>>>>> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h    (working
>> copy)
>>>>>>>>>>> @@ -287,6 +287,8 @@
>>>>>>>>>>> SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
>>>>>>>>>>> SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
>>>>>>>>>>> 
>>>>>>>>>>> +  SDValue ExpandExtend(SDNode *N);
>>>>>>>>>>> +
>>>>>>>>>>> void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
>>>>>>>>> ISD::CondCode
>>>>>>>>>> Code);
>>>>>>>>>>> 
>>>>>>>>>>> //===--------------------------------------------------------
>> -
>>>>>> --
>>>>>>>>> --
>>>>>>>>>> -------===//
>>>>>>>>>>> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>>>>>>>>>> 
>>>>>>> 
>> ===================================================================
>>>>>>>>>>> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>>>>>  (revision
>>>>>>>>>> 164117)
>>>>>>>>>>> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>>>>>  (working
>>>>>>>>>> copy)
>>>>>>>>>>> @@ -732,7 +732,44 @@
>>>>>>>>>>> //===--------------------------------------------------------
>> --
>>>> -
>>>>>> --
>>>>>>>>> --
>>>>>>>>>> -------===//
>>>>>>>>>>> //  Integer Operand Promotion
>>>>>>>>>>> //===--------------------------------------------------------
>> --
>>>> -
>>>>>> --
>>>>>>>>> --
>>>>>>>>>> -------===//
>>>>>>>>>>> +SDValue DAGTypeLegalizer::ExpandExtend(SDNode *N) {
>>>>>>>>>> 
>>>>>>>>>> This function really needs a comment explaining its purpose
>> and
>>>>>>>>> perhaps
>>>>>>>>>> a different name. It's not really doing a full expand of the
>>>>>>> extend,
>>>>>>>>>> but rather doing some housekeeping beforehand to make sure the
>>>>>>>>>> expansion that results from the Promote*() call will be
>>>>>> reasonable
>>>>>>>>> for
>>>>>>>>>> the target. In any case, some comments about the motivation
>> and
>>>>>>> logic
>>>>>>>>>> would be very helpful.
>>>>>>>>>> 
>>>>>>>>>>> +  EVT VT0 = N->getValueType(0);
>>>>>>>>>>> +  EVT VT1 = N->getOperand(0).getValueType();
>>>>>>>>>>> +  DebugLoc dl = N->getDebugLoc();
>>>>>>>>>>> +  // We only split the node if the operation is not legal,
>>>>>>>>> otherwise
>>>>>>>>>> we could get
>>>>>>>>>>> +  // a legalization loop.
>>>>>>>>>>> +  if (TLI.getOperationAction(N->getOpcode(), VT1) ==
>>>>>>>>>> TargetLowering::Legal)
>>>>>>>>>>> +    return SDValue();
>>>>>>>>>>> 
>>>>>>>>>>> +  if (!VT0.isVector())
>>>>>>>>>>> +    return SDValue();
>>>>>>>>>>> +
>>>>>>>>>>> +  // We don't want expand if the intermediate type is the
>>>>>> v2i16,
>>>>>>>>>>> +  // since v2i16 is not a legal type. This only happens when
>>>>>> the
>>>>>>>>>>> +  // destination has the v2i32 type and the source is v2i8.
>>>>>>>>>>> +  if (VT1 == MVT::v2i8 && VT0 == MVT::v2i32)
>>>>>>>>>>> +    return SDValue();
>>>>>>>>>> 
>>>>>>>>>> This is in generic code now, so we can't make assumptions
>> about
>>>>>>> what
>>>>>>>>>> types are legal. We need to ask the target. This function
>> needs
>>>>>> to
>>>>>>> be
>>>>>>>>>> made more generic, basically. For example, here we assume that
>>>>>>> v2i16
>>>>>>>>>> isn't legal. That's true for ARM, but quite possibly untrue
>> for
>>>>>>> other
>>>>>>>>>> targets that support 32-bit vector types. Likewise, we may
>> need
>>>>>> to
>>>>>>>>>> check for which operations are legal as well as which types.
>>>>>>>>>> 
>>>>>>>>>>> +  // If there is already an instruction that could be
>> directly
>>>>>>>>>>> +  // selected for this there is no reason to expand since it
>>>>>> will
>>>>>>>>>>> +  // generate a redundant expand instruction (from type X to
>>>>>> type
>>>>>>>>>> X).
>>>>>>>>>>> +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
>>>>>>>>>>> +    return SDValue();
>>>>>>>>>>> +
>>>>>>>>>>> +  LLVMContext *Context = DAG.getContext();
>>>>>>>>>>> +
>>>>>>>>>>> +  // We need to get the intermediate type. This type will be
>>>>>>> twice
>>>>>>>>>> as
>>>>>>>>>>> +  // small as the destination type, since all expand
>>>>>> instructions
>>>>>>>>>> are
>>>>>>>>>>> +  // from a length X to a length 2*X and we have expand
>>>>>>>>> instructions
>>>>>>>>>>> +  // for all destination types.
>>>>>>>>>>> +  EVT middleType = EVT::getVectorVT(*Context,
>>>>>>>>>>> +                          EVT::getIntegerVT(*Context,
>>>>>>>>>>> +
>>>>>>>>> VT0.getScalarType().getSizeInBits()
>>>>>>>>>> / 2),
>>>>>>>>>>> +                          VT0.getVectorNumElements());
>>>>>>>>>>> +  return DAG.getNode(N->getOpcode(), dl, VT0,
>>>>>>>>>>> +                     DAG.getNode(N->getOpcode(), dl,
>>>>>>>>>>> +                                 middleType, N-
>>>>>>> getOperand(0)));
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> /// PromoteIntegerOperand - This method is called when the
>>>>>>>>> specified
>>>>>>>>>> operand of
>>>>>>>>>>> /// the specified node is found to need promotion.  At this
>>>>>> point,
>>>>>>>>>> all of the
>>>>>>>>>>> /// result types of the node are known to be legal, but other
>>>>>>>>>> operands of the
>>>>>>>>>>> @@ -752,7 +789,10 @@
>>>>>>>>>>> #endif
>>>>>>>>>>>  llvm_unreachable("Do not know how to promote this
>> operator's
>>>>>>>>>> operand!");
>>>>>>>>>>> 
>>>>>>>>>>> -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N);
>>>>>>> break;
>>>>>>>>>>> +  case ISD::ANY_EXTEND:
>>>>>>>>>>> +    Res = ExpandExtend(N);
>>>>>>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
>>>>>>>>>>> +    break;
>>>>>>>>>>> case ISD::ATOMIC_STORE:
>>>>>>>>>>>  Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
>>>>>>>>>>>  break;
>>>>>>>>>>> @@ -774,15 +814,20 @@
>>>>>>>>>>> case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo);
>>>>>>>>> break;
>>>>>>>>>>> case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N,
>> OpNo);
>>>>>>>>>> break;
>>>>>>>>>>> case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo);
>>>>>>>>> break;
>>>>>>>>>>> -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N);
>>>>>>>>> break;
>>>>>>>>>>> +  case ISD::SIGN_EXTEND:
>>>>>>>>>>> +    Res = ExpandExtend(N);
>>>>>>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
>>>>>>>>>>> +    break;
>>>>>>>>>>> case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N);
>>>>>> break;
>>>>>>>>>>> case ISD::STORE:        Res =
>>>>>>>>>> PromoteIntOp_STORE(cast<StoreSDNode>(N),
>>>>>>>>>>>                                                 OpNo);
>> break;
>>>>>>>>>>> case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N);
>> break;
>>>>>>>>>>> case ISD::FP16_TO_FP32:
>>>>>>>>>>> case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N);
>>>>>> break;
>>>>>>>>>>> -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N);
>>>>>>>>> break;
>>>>>>>>>>> -
>>>>>>>>>>> +  case ISD::ZERO_EXTEND:
>>>>>>>>>>> +    Res = ExpandExtend(N);
>>>>>>>>>>> +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
>>>>>>>>>>> +    break;
>>>>>>>>>>> case ISD::SHL:
>>>>>>>>>>> case ISD::SRA:
>>>>>>>>>>> case ISD::SRL:
>>>>>>>>>>> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
>>>>>>>>>>> 
>>>>>>> 
>> ===================================================================
>>>>>>>>>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (revision
>>>> 164117)
>>>>>>>>>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp    (working
>> copy)
>>>>>>>>>>> @@ -2565,7 +2565,8 @@
>>>>>>>>>>>          VT.getVectorNumElements() ==
>>>>>>>>>>>          Operand.getValueType().getVectorNumElements()) &&
>>>>>>>>>>>         "Vector element count mismatch!");
>>>>>>>>>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
>>>>>>>>>> ISD::ZERO_EXTEND)
>>>>>>>>>>> +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
>>>>>>>>>> ISD::ZERO_EXTEND) &&
>>>>>>>>>>> +       TLI.isOperationLegal(OpOpcode,
>> Operand.getValueType()))
>>>>>>>>>>>    return getNode(OpOpcode, DL, VT, Operand.getNode()-
>>>>>>>>>>> getOperand(0));
>>>>>>>>>>>  else if (OpOpcode == ISD::UNDEF)
>>>>>>>>>>>    // sext(undef) = 0, because the top bits will all be the
>>>>>>>>> same.
>>>>>>>>>>> @@ -2581,7 +2582,9 @@
>>>>>>>>>>>          VT.getVectorNumElements() ==
>>>>>>>>>>>          Operand.getValueType().getVectorNumElements()) &&
>>>>>>>>>>>         "Vector element count mismatch!");
>>>>>>>>>>> -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) -
>>> 
>>>>>>>>> (zext
>>>>>>>>>> x)
>>>>>>>>>>> +    if (OpOpcode == ISD::ZERO_EXTEND &&
>>>>>>>>>>> +        TLI.isOperationLegal(OpOpcode,
>>>> Operand.getValueType()))
>>>>>>>>>>> +      // (zext (zext x)) -> (zext x)
>>>>>>>>>>>    return getNode(ISD::ZERO_EXTEND, DL, VT,
>>>>>>>>>>>                   Operand.getNode()->getOperand(0));
>>>>>>>>>>>  else if (OpOpcode == ISD::UNDEF)
>>>>>>>>>>> @@ -2599,8 +2602,9 @@
>>>>>>>>>>>          Operand.getValueType().getVectorNumElements()) &&
>>>>>>>>>>>         "Vector element count mismatch!");
>>>>>>>>>>> 
>>>>>>>>>>> -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
>>>>>>>>> ISD::SIGN_EXTEND
>>>>>>>>>> ||
>>>>>>>>>>> -        OpOpcode == ISD::ANY_EXTEND)
>>>>>>>>>>> +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
>>>>>>>>>> ISD::SIGN_EXTEND ||
>>>>>>>>>>> +        OpOpcode == ISD::ANY_EXTEND) &&
>>>>>>>>>>> +        TLI.isOperationLegal(OpOpcode,
>>>> Operand.getValueType()))
>>>>>>>>>>>    // (ext (zext x)) -> (zext x)  and  (ext (sext x)) ->
>>>>>> (sext
>>>>>>>>> x)
>>>>>>>>>>>    return getNode(OpOpcode, DL, VT, Operand.getNode()-
>>>>>>>>>>> getOperand(0));
>>>>>>>>>>>  else if (OpOpcode == ISD::UNDEF)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Sep 19, 2012, at 3:47 AM, Silviu Baranga <silbar01 at arm.com>
>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Jim,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for the review. It does make sense to do this in a
>>>> target
>>>>>>>>>>> independent way. I've attached a new patch that does this.
>>>>>>>>>>> 
>>>>>>>>>>> I've moved the code ARM-specific code into the legalizer and
>>>>>> also
>>>>>>>>>>> added tests for sext nodes. It appears that I was mistaken
>>>>>> before
>>>>>>>>>>> and (sext (sextload)) patterns were not prevented from
>> folding
>>>>>> for
>>>>>>>>>>> vector types. This was stopping the generation of vaddl-type
>>>>>>>>>>> instructions. I think that stopping the folding of (sext
>>>>>>>>> (sextload))
>>>>>>>>>>> patterns is still OK, because the merging would result in a
>>>>>>> illegal
>>>>>>>>>>> operation.
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Silviu
>>>>>>>>>>> 
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>>>>>>>>>>> Sent: 17 September 2012 18:09
>>>>>>>>>>>> To: Silviu Baranga
>>>>>>>>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>>>>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
>>>>>>>>> vaddl-
>>>>>>>>>> type
>>>>>>>>>>>> instructions from extloads on ARM
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Silviu,
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for the update. This is definitely much better. Why
>> do
>>>>>> we
>>>>>>>>>> need
>>>>>>>>>>>> any ARM specific code here, though? Given the problem
>>>>>>> description,
>>>>>>>>>> it
>>>>>>>>>>>> seems like this is something the legalizer should be able to
>>>>>>>>> reason
>>>>>>>>>>>> about without needing custom handling in the target.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Jim
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Sep 6, 2012, at 9:34 AM, Silviu Baranga
>> <silbar01 at arm.com>
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Jim,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for the review. The BREAK node solution was a bit
>>>> ugly.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I've attached a new patch that solves this without a BREAK
>>>>>> node.
>>>>>>>>>>>>> This is done by stopping the folding of nodes zext nodes
>> when
>>>>>>>>>>>>> this will result in a non-legal operation.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I also had to stop the (zext (zextload x)) folding for
>> vector
>>>>>>>>>>>>> types. This is already being done for sext nodes.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Silviu
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>>>>>>>>>>>>> Sent: 06 September 2012 01:42
>>>>>>>>>>>>>> To: Silviu Baranga
>>>>>>>>>>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>>>>>>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation
>> of
>>>>>>>>>> vaddl-
>>>>>>>>>>>> type
>>>>>>>>>>>>>> instructions from extloads on ARM
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Silviu,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I like the idea of teaching the compiler to use these
>>>>>>>>>> instructions,
>>>>>>>>>>>> but
>>>>>>>>>>>>>> the BREAK pseudo-instruction feels pretty gross to me. I'd
>>>>>> much
>>>>>>>>>>>> prefer
>>>>>>>>>>>>>> we figure out how to teach the combiner and isel to be
>>>>>> smarter
>>>>>>>>>>>> directly
>>>>>>>>>>>>>> rather than doing something like that.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Jim
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Sep 4, 2012, at 1:58 AM, Silviu Baranga
>>>> <silbar01 at arm.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Currently LLVM will not generate a vaddl-type
>> instructions
>>>>>>> when
>>>>>>>>>> an
>>>>>>>>>>>>>> extload
>>>>>>>>>>>>>>> node is in the input dag. All vector extload nodes are
>>>>>>>>>> implemented
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> a load instruction and a lengthening instruction on ARM.
>> If
>>>>>>> the
>>>>>>>>>>>>>> lengthening
>>>>>>>>>>>>>>> instruction is followed by an add instruction, it would
>> be
>>>>>>>>> better
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> generate
>>>>>>>>>>>>>>> a vaddl-type instruction instead.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> To do this, we split all lengthening dag nodes which
>> extend
>>>>>> a
>>>>>>>>>> type
>>>>>>>>>>>> to
>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>> then twice its size (this is OK because these nodes would
>>>> be
>>>>>>>>>>>>>> implemented by
>>>>>>>>>>>>>>> multiple instructions anyway) into two lengthening nodes.
>>>>>> One
>>>>>>>>>> node
>>>>>>>>>>>>>> will be
>>>>>>>>>>>>>>> later
>>>>>>>>>>>>>>> combined to form the extload node, and the other can be
>>>>>>>>> selected
>>>>>>>>>> as
>>>>>>>>>>>>>> part of
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> vaddl-type instruction (there are already patterns for
>>>>>> this).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> To stop the dag nodes from re-combining, we insert a
>> BREAK
>>>>>>> node
>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>> them. The
>>>>>>>>>>>>>>> BREAK node is used as an optimization barrier for the DAG
>>>>>>>>>> combiner,
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> might be
>>>>>>>>>>>>>>> useful in other cases as well. We remove the BREAK node
>> in
>>>>>> the
>>>>>>>>>>>>>> instruction
>>>>>>>>>>>>>>> selection stage.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This approach allows the reuse of existing selection
>>>>>> patterns.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This patch depends on the following other patch:
>>>>>>>>>>>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-
>>>> Mon-
>>>>>>>>>>>>>> 20120827/149201.
>>>>>>>>>>>>>>> html
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Please review!
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>> Silviu<patch.diff>_______________________________________________
>>>>>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> <zext_split.diff>
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> <zext_split.diff>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> -- IMPORTANT NOTICE: The contents of this email and any attachments
>> are confidential and may also be privileged. If you are not the
>> intended recipient, please notify the sender immediately and do not
>> disclose the contents to any other person, use it for any purpose, or
>> store or copy the information in any medium.  Thank you.<zext.diff>
>> 
>> 
> 
> 
> -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.<zext.diff>