[llvm-commits] [patch] Enable the generation of vaddl-type instructions from extloads on ARM

Wed Sep 19 09:56:20 PDT 2012

Hi Silviu,

Making good progress! A few comments on the updated patch inline.

> Index: test/CodeGen/ARM/vector-extend-narrow.ll
> ===================================================================
> --- test/CodeGen/ARM/vector-extend-narrow.ll	(revision 164117)
> +++ test/CodeGen/ARM/vector-extend-narrow.ll	(working copy)
> @@ -73,3 +73,68 @@
>    ret <4 x i32> %2
>  }
>  
> +; CHECK: k:
> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
> +  ; CHECK: vld1.32
> +  ; CHECK: vmovl.u8
> +  ; CHECK: vaddw.u16
> +  %1 = load <4 x i8>* %in, align 4
> +  %2 = zext <4 x i8> %1 to <4 x i32>
> +  %3 = add <4 x i32> %2, %x
> +  ret <4 x i32> %3
> +}
> +; CHECK: l:
> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
> +  ; CHECK: vld1.16
> +  ; CHECK: vmovl.u8
> +  ; CHECK: vmovl.u16
> +  ; CHECK: vaddw.u32
> +  %1 = load <2 x i8>* %in, align 4
> +  %2 = zext <2 x i8> %1 to <2 x i64>
> +  %3 = add <2 x i64> %2, %x
> +  ret <2 x i64> %3
> +}
> +
> +; CHECK: m:
> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
> +  ; CHECK: vld1.32
> +  ; CHECK: vmovl.u16
> +  ; CHECK: vaddw.u32
> +  %1 = load <2 x i16>* %in, align 4
> +  %2 = zext <2 x i16> %1 to <2 x i64>
> +  %3 = add <2 x i64> %2, %x
> +  ret <2 x i64> %3
> +}
> +
> +; CHECK: n:
> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
> +  ; CHECK: vld1.32
> +  ; CHECK: vmovl.s8
> +  ; CHECK: vaddw.s16
> +  %1 = load <4 x i8>* %in, align 4
> +  %2 = sext <4 x i8> %1 to <4 x i32>
> +  %3 = add <4 x i32> %2, %x
> +  ret <4 x i32> %3
> +}
> +; CHECK: o:
> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
> +  ; CHECK: vld1.16
> +  ; CHECK: vmovl.s8
> +  ; CHECK: vmovl.s16
> +  ; CHECK: vaddw.s32
> +  %1 = load <2 x i8>* %in, align 4
> +  %2 = sext <2 x i8> %1 to <2 x i64>
> +  %3 = add <2 x i64> %2, %x
> +  ret <2 x i64> %3
> +}
> +
> +; CHECK: p:
> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
> +  ; CHECK: vld1.32
> +  ; CHECK: vmovl.s16
> +  ; CHECK: vaddw.s32
> +  %1 = load <2 x i16>* %in, align 4
> +  %2 = sext <2 x i16> %1 to <2 x i64>
> +  %3 = add <2 x i64> %2, %x
> +  ret <2 x i64> %3
> +}
> Index: lib/Target/ARM/ARMISelLowering.cpp
> ===================================================================
> --- lib/Target/ARM/ARMISelLowering.cpp	(revision 164117)
> +++ lib/Target/ARM/ARMISelLowering.cpp	(working copy)
> @@ -565,6 +570,12 @@
>        setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
>        setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
>      }
> +
> +    for (unsigned i = 0; i < 6; ++i) {
> +      setOperationAction(ISD::ZERO_EXTEND, Tys[i], Expand);
> +      setOperationAction(ISD::SIGN_EXTEND, Tys[i], Expand);
> +      setOperationAction(ISD::ANY_EXTEND, Tys[i], Expand);
> +    }
>    }

I'd have sworn I commented about this bit before, but I can't find it in my email logs.

Anyways, This, and the loop above, should be moved out to a helper function and the loop removed. That is, the helper should perform the setOperationAction() calls for a single type, then the top level function just calls that helper 6 times, once for each MVT. The NEON bits for setting legal types do something similar.

>  
>    // ARM and Thumb2 support UMLAL/SMLAL.
> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> ===================================================================
> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 164117)
> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
> @@ -4342,7 +4342,7 @@
>    // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
>    // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
>    if ((ISD::isSEXTLoad(N0.getNode()) || ISD::isEXTLoad(N0.getNode())) &&
> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() && !VT.isVector()) {
>      LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>      EVT MemVT = LN0->getMemoryVT();
>      if ((!LegalOperations && !LN0->isVolatile()) ||
> @@ -4678,7 +4678,7 @@
>    // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
>    // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
>    if ((ISD::isZEXTLoad(N0.getNode()) || ISD::isEXTLoad(N0.getNode())) &&
> -      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> +      ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() && !VT.isVector()) {
>      LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>      EVT MemVT = LN0->getMemoryVT();
>      if ((!LegalOperations && !LN0->isVolatile()) ||
> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
> ===================================================================
> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h	(revision 164117)
> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h	(working copy)
> @@ -287,6 +287,8 @@
>    SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
>    SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
>  
> +  SDValue ExpandExtend(SDNode *N);
> +
>    void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);
>  
>    //===--------------------------------------------------------------------===//
> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> ===================================================================
> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(revision 164117)
> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp	(working copy)
> @@ -732,7 +732,44 @@
>  //===----------------------------------------------------------------------===//
>  //  Integer Operand Promotion
>  //===----------------------------------------------------------------------===//
> +SDValue DAGTypeLegalizer::ExpandExtend(SDNode *N) { 

This function really needs a comment explaining its purpose and perhaps a different name. It's not really doing a full expand of the extend, but rather doing some housekeeping beforehand to make sure the expansion that results from the Promote*() call will be reasonable for the target. In any case, some comments about the motivation and logic would be very helpful.

> +  EVT VT0 = N->getValueType(0);
> +  EVT VT1 = N->getOperand(0).getValueType();
> +  DebugLoc dl = N->getDebugLoc();
> +  // We only split the node if the operation is not legal, otherwise we could get
> +  // a legalization loop.
> +  if (TLI.getOperationAction(N->getOpcode(), VT1) == TargetLowering::Legal)
> +    return SDValue();
>  
> +  if (!VT0.isVector())
> +    return SDValue();
> +
> +  // We don't want expand if the intermediate type is the v2i16,
> +  // since v2i16 is not a legal type. This only happens when the
> +  // destination has the v2i32 type and the source is v2i8.
> +  if (VT1 == MVT::v2i8 && VT0 == MVT::v2i32)
> +    return SDValue();

This is in generic code now, so we can't make assumptions about what types are legal. We need to ask the target. This function needs to be made more generic, basically. For example, here we assume that v2i16 isn't legal. That's true for ARM, but quite possibly untrue for other targets that support 32-bit vector types. Likewise, we may need to check for which operations are legal as well as which types.

> +  // If there is already an instruction that could be directly
> +  // selected for this there is no reason to expand since it will
> +  // generate a redundant expand instruction (from type X to type X).
> +  if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> +    return SDValue();
> +
> +  LLVMContext *Context = DAG.getContext();
> +
> +  // We need to get the intermediate type. This type will be twice as
> +  // small as the destination type, since all expand instructions are
> +  // from a length X to a length 2*X and we have expand instructions
> +  // for all destination types.
> +  EVT middleType = EVT::getVectorVT(*Context,
> +                          EVT::getIntegerVT(*Context,
> +                                VT0.getScalarType().getSizeInBits() / 2),
> +                          VT0.getVectorNumElements());
> +  return DAG.getNode(N->getOpcode(), dl, VT0,
> +                     DAG.getNode(N->getOpcode(), dl,
> +                                 middleType, N->getOperand(0)));
> +}
> +
>  /// PromoteIntegerOperand - This method is called when the specified operand of
>  /// the specified node is found to need promotion.  At this point, all of the
>  /// result types of the node are known to be legal, but other operands of the
> @@ -752,7 +789,10 @@
>    #endif
>      llvm_unreachable("Do not know how to promote this operator's operand!");
>  
> -  case ISD::ANY_EXTEND:   Res = PromoteIntOp_ANY_EXTEND(N); break;
> +  case ISD::ANY_EXTEND:
> +    Res = ExpandExtend(N);
> +    if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
> +    break;
>    case ISD::ATOMIC_STORE:
>      Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
>      break;
> @@ -774,15 +814,20 @@
>    case ISD::SELECT:       Res = PromoteIntOp_SELECT(N, OpNo); break;
>    case ISD::SELECT_CC:    Res = PromoteIntOp_SELECT_CC(N, OpNo); break;
>    case ISD::SETCC:        Res = PromoteIntOp_SETCC(N, OpNo); break;
> -  case ISD::SIGN_EXTEND:  Res = PromoteIntOp_SIGN_EXTEND(N); break;
> +  case ISD::SIGN_EXTEND:
> +    Res = ExpandExtend(N);
> +    if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
> +    break;
>    case ISD::SINT_TO_FP:   Res = PromoteIntOp_SINT_TO_FP(N); break;
>    case ISD::STORE:        Res = PromoteIntOp_STORE(cast<StoreSDNode>(N),
>                                                     OpNo); break;
>    case ISD::TRUNCATE:     Res = PromoteIntOp_TRUNCATE(N); break;
>    case ISD::FP16_TO_FP32:
>    case ISD::UINT_TO_FP:   Res = PromoteIntOp_UINT_TO_FP(N); break;
> -  case ISD::ZERO_EXTEND:  Res = PromoteIntOp_ZERO_EXTEND(N); break;
> -
> +  case ISD::ZERO_EXTEND:
> +    Res = ExpandExtend(N);
> +    if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
> +    break;
>    case ISD::SHL:
>    case ISD::SRA:
>    case ISD::SRL:
> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
> ===================================================================
> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 164117)
> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
> @@ -2565,7 +2565,8 @@
>              VT.getVectorNumElements() ==
>              Operand.getValueType().getVectorNumElements()) &&
>             "Vector element count mismatch!");
> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode == ISD::ZERO_EXTEND)
> +    if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode == ISD::ZERO_EXTEND) &&
> +       TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>        return getNode(OpOpcode, DL, VT, Operand.getNode()->getOperand(0));
>      else if (OpOpcode == ISD::UNDEF)
>        // sext(undef) = 0, because the top bits will all be the same.
> @@ -2581,7 +2582,9 @@
>              VT.getVectorNumElements() ==
>              Operand.getValueType().getVectorNumElements()) &&
>             "Vector element count mismatch!");
> -    if (OpOpcode == ISD::ZERO_EXTEND)   // (zext (zext x)) -> (zext x)
> +    if (OpOpcode == ISD::ZERO_EXTEND &&
> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> +      // (zext (zext x)) -> (zext x)
>        return getNode(ISD::ZERO_EXTEND, DL, VT,
>                       Operand.getNode()->getOperand(0));
>      else if (OpOpcode == ISD::UNDEF)
> @@ -2599,8 +2602,9 @@
>              Operand.getValueType().getVectorNumElements()) &&
>             "Vector element count mismatch!");
>  
> -    if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode == ISD::SIGN_EXTEND ||
> -        OpOpcode == ISD::ANY_EXTEND)
> +    if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode == ISD::SIGN_EXTEND ||
> +        OpOpcode == ISD::ANY_EXTEND) &&
> +        TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>        // (ext (zext x)) -> (zext x)  and  (ext (sext x)) -> (sext x)
>        return getNode(OpOpcode, DL, VT, Operand.getNode()->getOperand(0));
>      else if (OpOpcode == ISD::UNDEF)

On Sep 19, 2012, at 3:47 AM, Silviu Baranga <silbar01 at arm.com> wrote:

> Hi Jim,
> 
> Thanks for the review. It does make sense to do this in a target
> independent way. I've attached a new patch that does this.
> 
> I've moved the code ARM-specific code into the legalizer and also
> added tests for sext nodes. It appears that I was mistaken before
> and (sext (sextload)) patterns were not prevented from folding for
> vector types. This was stopping the generation of vaddl-type
> instructions. I think that stopping the folding of (sext (sextload))
> patterns is still OK, because the merging would result in a illegal
> operation.
> 
> Cheers,
> Silviu
> 
>> -----Original Message-----
>> From: Jim Grosbach [mailto:grosbach at apple.com]
>> Sent: 17 September 2012 18:09
>> To: Silviu Baranga
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-type
>> instructions from extloads on ARM
>> 
>> Hi Silviu,
>> 
>> Thanks for the update. This is definitely much better. Why do we need
>> any ARM specific code here, though? Given the problem description, it
>> seems like this is something the legalizer should be able to reason
>> about without needing custom handling in the target.
>> 
>> -Jim
>> 
>> 
>> On Sep 6, 2012, at 9:34 AM, Silviu Baranga <silbar01 at arm.com> wrote:
>> 
>>> Hi Jim,
>>> 
>>> Thanks for the review. The BREAK node solution was a bit ugly.
>>> 
>>> I've attached a new patch that solves this without a BREAK node.
>>> This is done by stopping the folding of nodes zext nodes when
>>> this will result in a non-legal operation.
>>> 
>>> I also had to stop the (zext (zextload x)) folding for vector
>>> types. This is already being done for sext nodes.
>>> 
>>> Thanks,
>>> Silviu
>>> 
>>>> -----Original Message-----
>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>>> Sent: 06 September 2012 01:42
>>>> To: Silviu Baranga
>>>> Cc: llvm-commits at cs.uiuc.edu
>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-
>> type
>>>> instructions from extloads on ARM
>>>> 
>>>> Hi Silviu,
>>>> 
>>>> I like the idea of teaching the compiler to use these instructions,
>> but
>>>> the BREAK pseudo-instruction feels pretty gross to me. I'd much
>> prefer
>>>> we figure out how to teach the combiner and isel to be smarter
>> directly
>>>> rather than doing something like that.
>>>> 
>>>> -Jim
>>>> 
>>>> On Sep 4, 2012, at 1:58 AM, Silviu Baranga <silbar01 at arm.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Currently LLVM will not generate a vaddl-type instructions when an
>>>> extload
>>>>> node is in the input dag. All vector extload nodes are implemented
>>>> with
>>>>> a load instruction and a lengthening instruction on ARM. If the
>>>> lengthening
>>>>> instruction is followed by an add instruction, it would be better
>> to
>>>>> generate
>>>>> a vaddl-type instruction instead.
>>>>> 
>>>>> To do this, we split all lengthening dag nodes which extend a type
>> to
>>>> more
>>>>> then twice its size (this is OK because these nodes would be
>>>> implemented by
>>>>> multiple instructions anyway) into two lengthening nodes. One node
>>>> will be
>>>>> later
>>>>> combined to form the extload node, and the other can be selected as
>>>> part of
>>>>> a
>>>>> vaddl-type instruction (there are already patterns for this).
>>>>> 
>>>>> To stop the dag nodes from re-combining, we insert a BREAK node
>>>> between
>>>>> them. The
>>>>> BREAK node is used as an optimization barrier for the DAG combiner,
>>>> and
>>>>> might be
>>>>> useful in other cases as well. We remove the BREAK node in the
>>>> instruction
>>>>> selection stage.
>>>>> 
>>>>> This approach allows the reuse of existing selection patterns.
>>>>> 
>>>>> This patch depends on the following other patch:
>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-
>>>> 20120827/149201.
>>>>> html
>>>>> 
>>>>> Please review!
>>>>> 
>>>>> Thanks,
>>>>> Silviu<patch.diff>_______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> 
>>>> 
>>> <zext_split.diff>
>> 
>> 
> <zext_split.diff>