[llvm-commits] [patch] Enable the generation of vaddl-type instructions from extloads on ARM
Silviu Baranga
silbar01 at arm.com
Fri Dec 7 04:33:12 PST 2012
Ping!
--Silviu
> -----Original Message-----
> From: Silviu Baranga [mailto:silbar01 at arm.com]
> Sent: 26 November 2012 16:38
> To: 'Jim Grosbach'
> Cc: llvm-commits at cs.uiuc.edu for LLVM; Evan Cheng
> Subject: RE: [llvm-commits] [patch] Enable the generation of vaddl-type
> instructions from extloads on ARM
>
> Hi Jim,
>
> I finally got around to re-writing this patch. Sorry for taking
> so long.
>
> I've refactored this patch to address your previous comments.
> The code now uses "isOperationLegalOrCustom" to decide if it
> can introduce the new zext/sext node for the intermediary type.
>
> On the ARM side I've set the zext action for 64 bit vectors to legal,
> and everything else to expand. All extload-type vector actions
> are also set to expand now, since there are no instructions that
> implement this.
>
> As for the "LowerExtendOp" name, I'm open to suggestions.
>
> This also fixes PR11570.
>
> Cheers,
> Silviu
>
> > -----Original Message-----
> > From: Jim Grosbach [mailto:grosbach at apple.com]
> > Sent: 02 October 2012 21:54
> > To: Silviu Baranga
> > Cc: llvm-commits at cs.uiuc.edu for LLVM; Evan Cheng
> > Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-
> type
> > instructions from extloads on ARM
> >
> > Hi Silviu,
> >
> > Specific comments below.
> >
> > Evan, you're more expert than I in this part of the code. There's
> > something about this whole thing that feels a bit funny, but I can't
> > put my finger on it effectively enough to be very helpful. Mind
> having
> > a look? It'll probably be obvious one way or the other to you.
> >
> > > Index: test/CodeGen/ARM/vector-extend-narrow.ll
> >
> > General comment: thank you for making sure to include test cases.
> It's
> > amazing how many patches don't do that...
> >
> > > ===================================================================
> > > --- test/CodeGen/ARM/vector-extend-narrow.ll (revision 164117)
> > > +++ test/CodeGen/ARM/vector-extend-narrow.ll (working copy)
> > > @@ -73,3 +73,68 @@
> > > ret <4 x i32> %2
> > > }
> > >
> > > +; CHECK: k:
> > > +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
> > > + ; CHECK: vld1.32
> > > + ; CHECK: vmovl.u8
> > > + ; CHECK: vaddw.u16
> > > + %1 = load <4 x i8>* %in, align 4
> > > + %2 = zext <4 x i8> %1 to <4 x i32>
> > > + %3 = add <4 x i32> %2, %x
> > > + ret <4 x i32> %3
> > > +}
> > > +; CHECK: l:
> > > +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
> > > + ; CHECK: vld1.16
> > > + ; CHECK: vmovl.u8
> > > + ; CHECK: vmovl.u16
> > > + ; CHECK: vaddw.u32
> > > + %1 = load <2 x i8>* %in, align 4
> > > + %2 = zext <2 x i8> %1 to <2 x i64>
> > > + %3 = add <2 x i64> %2, %x
> > > + ret <2 x i64> %3
> > > +}
> > > +
> > > +; CHECK: m:
> > > +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
> > > + ; CHECK: vld1.32
> > > + ; CHECK: vmovl.u16
> > > + ; CHECK: vaddw.u32
> > > + %1 = load <2 x i16>* %in, align 4
> > > + %2 = zext <2 x i16> %1 to <2 x i64>
> > > + %3 = add <2 x i64> %2, %x
> > > + ret <2 x i64> %3
> > > +}
> > > +
> > > +; CHECK: n:
> > > +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
> > > + ; CHECK: vld1.32
> > > + ; CHECK: vmovl.s8
> > > + ; CHECK: vaddw.s16
> > > + %1 = load <4 x i8>* %in, align 4
> > > + %2 = sext <4 x i8> %1 to <4 x i32>
> > > + %3 = add <4 x i32> %2, %x
> > > + ret <4 x i32> %3
> > > +}
> > > +; CHECK: o:
> > > +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
> > > + ; CHECK: vld1.16
> > > + ; CHECK: vmovl.s8
> > > + ; CHECK: vmovl.s16
> > > + ; CHECK: vaddw.s32
> > > + %1 = load <2 x i8>* %in, align 4
> > > + %2 = sext <2 x i8> %1 to <2 x i64>
> > > + %3 = add <2 x i64> %2, %x
> > > + ret <2 x i64> %3
> > > +}
> > > +
> > > +; CHECK: p:
> > > +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
> > > + ; CHECK: vld1.32
> > > + ; CHECK: vmovl.s16
> > > + ; CHECK: vaddw.s32
> > > + %1 = load <2 x i16>* %in, align 4
> > > + %2 = sext <2 x i16> %1 to <2 x i64>
> > > + %3 = add <2 x i64> %2, %x
> > > + ret <2 x i64> %3
> > > +}
> > > Index: lib/Target/ARM/ARMISelLowering.h
> > > ===================================================================
> > > --- lib/Target/ARM/ARMISelLowering.h (revision 164117)
> > > +++ lib/Target/ARM/ARMISelLowering.h (working copy)
> > > @@ -403,6 +406,8 @@
> > > void addDRTypeForNEON(MVT VT);
> > > void addQRTypeForNEON(MVT VT);
> > >
> > > + void addExtendLegalizeTypeForNEON(MVT VT);
> > > +
> > > typedef SmallVector<std::pair<unsigned, SDValue>, 8>
> > RegsToPassVector;
> > > void PassF64ArgInRegs(DebugLoc dl, SelectionDAG &DAG,
> > > SDValue Chain, SDValue &Arg,
> > > Index: lib/Target/ARM/ARMISelLowering.cpp
> > > ===================================================================
> > > --- lib/Target/ARM/ARMISelLowering.cpp (revision 164117)
> > > +++ lib/Target/ARM/ARMISelLowering.cpp (working copy)
> > > @@ -158,6 +158,15 @@
> > > addTypeForNEON(VT, MVT::v2f64, MVT::v4i32);
> > > }
> > >
> > > +void ARMTargetLowering::addExtendLegalizeTypeForNEON(MVT VT) {
> > > + setLoadExtAction(ISD::EXTLOAD, VT, Legal);
> > > + setLoadExtAction(ISD::SEXTLOAD, VT, Legal);
> > > + setLoadExtAction(ISD::ZEXTLOAD, VT, Legal);
> > > + setOperationAction(ISD::ANY_EXTEND, VT, Expand);
> > > + setOperationAction(ISD::SIGN_EXTEND, VT, Expand);
> > > + setOperationAction(ISD::ZERO_EXTEND, VT, Expand);
> > > +}
> >
> > Formatting. Only two space indentation required.
> >
> > > +
> > > static TargetLoweringObjectFile *createTLOF(TargetMachine &TM) {
> > > if (TM.getSubtarget<ARMSubtarget>().isTargetDarwin())
> > > return new TargetLoweringObjectFileMachO();
> > > @@ -556,15 +565,17 @@
> > > setTargetDAGCombine(ISD::FP_TO_UINT);
> > > setTargetDAGCombine(ISD::FDIV);
> > >
> > > - // It is legal to extload from v4i8 to v4i16 or v4i32.
> > > - MVT Tys[6] = {MVT::v8i8, MVT::v4i8, MVT::v2i8,
> > > - MVT::v4i16, MVT::v2i16,
> > > - MVT::v2i32};
> > > - for (unsigned i = 0; i < 6; ++i) {
> > > - setLoadExtAction(ISD::EXTLOAD, Tys[i], Legal);
> > > - setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
> > > - setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
> > > - }
> > > + // There is no instruction to expand v4i8 to v4i32 and
> > > + // we must do this with two vmovl instructions. However,
> > > + // the vmovl instructions can be combined with other
> > > + // lengthening instructions, so we would prefer to lower
> > > + // (zextload x) -> (zext (zextload x).
> > > + addExtendLegalizeTypeForNEON(MVT::v8i8);
> > > + addExtendLegalizeTypeForNEON(MVT::v4i8);
> > > + addExtendLegalizeTypeForNEON(MVT::v2i8);
> > > + addExtendLegalizeTypeForNEON(MVT::v4i16);
> > > + addExtendLegalizeTypeForNEON(MVT::v2i16);
> > > + addExtendLegalizeTypeForNEON(MVT::v2i32);
> > > }
> > >
> > > // ARM and Thumb2 support UMLAL/SMLAL.
> > > Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> > > ===================================================================
> > > --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp (revision 164117)
> > > +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp (working copy)
> > > @@ -4342,7 +4342,7 @@
> > > // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
> > > // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
> > > if ((ISD::isSEXTLoad(N0.getNode()) ||
> > ISD::isEXTLoad(N0.getNode())) &&
> > > - ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> > > + ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> > !VT.isVector()) {
> > > LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> > > EVT MemVT = LN0->getMemoryVT();
> > > if ((!LegalOperations && !LN0->isVolatile()) ||
> > > @@ -4678,7 +4678,7 @@
> > > // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
> > > // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
> > > if ((ISD::isZEXTLoad(N0.getNode()) ||
> > ISD::isEXTLoad(N0.getNode())) &&
> > > - ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> > > + ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> > !VT.isVector()) {
> > > LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> > > EVT MemVT = LN0->getMemoryVT();
> > > if ((!LegalOperations && !LN0->isVolatile()) ||
> > > Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
> > > ===================================================================
> > > --- lib/CodeGen/SelectionDAG/LegalizeTypes.h (revision 164117)
> > > +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h (working copy)
> > > @@ -287,6 +287,8 @@
> > > SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
> > > SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
> > >
> > > + SDValue LowerExtendOp(SDNode *N);
> > > +
> > > void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
> ISD::CondCode
> > Code);
> > >
> > > //===-----------------------------------------------------------
> --
> > -------===//
> > > Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> > > ===================================================================
> > > --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp (revision
> > 164117)
> > > +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp (working
> > copy)
> > > @@ -733,6 +733,51 @@
> > > // Integer Operand Promotion
> > > //===-------------------------------------------------------------
> --
> > -------===//
> > >
> > > +// LowerExtendOp - This function breaks up vector zext/sext nodes
> so
> > that the
> > > +// result will more closely resemble the operations available on
> the
> > target.
> > > +// The resulting nodes can be each combined with other nodes to
> > create legal
> > > +// operations.
> >
> > Some addition questions below, but in general, this could probably
> use
> > a bit more description in the comment about what's actually happening
> > here. Why this transform is a good idea and exactly what the
> transform
> > is, in particular.
> >
> > > +SDValue DAGTypeLegalizer::LowerExtendOp(SDNode *N) {
> >
> > Is "Lower" really the right name for this function?
> >
> > > + EVT VT0 = N->getValueType(0);
> > > + EVT VT1 = N->getOperand(0).getValueType();
> > > + DebugLoc dl = N->getDebugLoc();
> > > + // We only split the node if the operation is not legal,
> otherwise
> > we could get
> > > + // a legalization loop.
> > > + if (TLI.getOperationAction(N->getOpcode(), VT1) ==
> > TargetLowering::Legal)
> >
> > You probably just want TLI.isOperationLegalOrCustom() here.
> >
> > > + return SDValue();
> > > +
> > > + if (!VT0.isVector())
> > > + return SDValue();
> >
> > Minor note, but I'd put this check first as it's cheaper and more
> > likely to fire than the check for the operation being legal.
> >
> > > +
> > > + // If there is already an instruction that could be directly
> > > + // selected for this there is no reason to expand since it will
> > > + // generate a redundant expand instruction (from type X to type
> > X).
> > > + if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> > > + return SDValue();
> >
> > I don't follow this. It seems pretty arbitrary. The comment doesn't
> > help much, since how do we know what instructions could be selected
> or
> > not? Can you clarify?
> >
> > > +
> > > + LLVMContext *Context = DAG.getContext();
> >
> > Can just have this be a reference since that's how it's used below
> and
> > we never can have a NULL context here.
> >
> > > +
> > > + // We need to get the intermediate type. This type will be twice
> > as
> > > + // small as the destination type,
> >
> > s/twice as small as/half the size of/
> >
> > > since all expand instructions are
> > > + // from a length X to a length 2*X and we have expand
> instructions
> > > + // for all destination types.
> >
> > Not sure I understand what you mean by "we have expand instructions
> for
> > all destination types." Can you clarify?
> >
> > > + EVT middleType = EVT::getVectorVT(*Context,
> >
> > Variable names should start with a Capitol letter. It's just function
> > names that are camel-case starting w/ lower case. That said, it's
> > unclear what "middle" means here. Perhaps a clearer name or a
> comment?
> >
> > > + EVT::getIntegerVT(*Context,
> > > +
> VT0.getScalarType().getSizeInBits()
> > / 2),
> > > + VT0.getVectorNumElements());
> > > +
> > > + // We don't want to expand if we can't use a legal intermediate
> > type
> > > + // since there wouldn't be any benefit from this and we might
> end
> > > + // up with nodes that cannot be selected in the instruction
> > selection
> > > + // stage.
> > > + if (!TLI.isTypeLegal(middleType))
> > > + return SDValue();
> > > +
> > > + return DAG.getNode(N->getOpcode(), dl, VT0,
> > > + DAG.getNode(N->getOpcode(), dl,
> > > + middleType, N->getOperand(0)));
> > > +}
> > > +
> > > /// PromoteIntegerOperand - This method is called when the
> specified
> > operand of
> > > /// the specified node is found to need promotion. At this point,
> > all of the
> > > /// result types of the node are known to be legal, but other
> > operands of the
> > > @@ -752,7 +797,10 @@
> > > #endif
> > > llvm_unreachable("Do not know how to promote this operator's
> > operand!");
> > >
> > > - case ISD::ANY_EXTEND: Res = PromoteIntOp_ANY_EXTEND(N); break;
> > > + case ISD::ANY_EXTEND:
> > > + Res = LowerExtendOp(N);
> > > + if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
> > > + break;
> > > case ISD::ATOMIC_STORE:
> > > Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
> > > break;
> > > @@ -774,15 +822,20 @@
> > > case ISD::SELECT: Res = PromoteIntOp_SELECT(N, OpNo);
> break;
> > > case ISD::SELECT_CC: Res = PromoteIntOp_SELECT_CC(N, OpNo);
> > break;
> > > case ISD::SETCC: Res = PromoteIntOp_SETCC(N, OpNo);
> break;
> > > - case ISD::SIGN_EXTEND: Res = PromoteIntOp_SIGN_EXTEND(N);
> break;
> > > + case ISD::SIGN_EXTEND:
> > > + Res = LowerExtendOp(N);
> > > + if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
> > > + break;
> > > case ISD::SINT_TO_FP: Res = PromoteIntOp_SINT_TO_FP(N); break;
> > > case ISD::STORE: Res =
> > PromoteIntOp_STORE(cast<StoreSDNode>(N),
> > > OpNo); break;
> > > case ISD::TRUNCATE: Res = PromoteIntOp_TRUNCATE(N); break;
> > > case ISD::FP16_TO_FP32:
> > > case ISD::UINT_TO_FP: Res = PromoteIntOp_UINT_TO_FP(N); break;
> > > - case ISD::ZERO_EXTEND: Res = PromoteIntOp_ZERO_EXTEND(N);
> break;
> > > -
> > > + case ISD::ZERO_EXTEND:
> > > + Res = LowerExtendOp(N);
> > > + if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
> > > + break;
> > > case ISD::SHL:
> > > case ISD::SRA:
> > > case ISD::SRL:
> > > Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
> > > ===================================================================
> > > --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp (revision 164117)
> > > +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp (working copy)
> > > @@ -2565,7 +2565,8 @@
> > > VT.getVectorNumElements() ==
> > > Operand.getValueType().getVectorNumElements()) &&
> > > "Vector element count mismatch!");
> > > - if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> > ISD::ZERO_EXTEND)
> > > + if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> > ISD::ZERO_EXTEND) &&
> > > + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> > > return getNode(OpOpcode, DL, VT, Operand.getNode()-
> > >getOperand(0));
> > > else if (OpOpcode == ISD::UNDEF)
> > > // sext(undef) = 0, because the top bits will all be the
> same.
> > > @@ -2581,7 +2582,9 @@
> > > VT.getVectorNumElements() ==
> > > Operand.getValueType().getVectorNumElements()) &&
> > > "Vector element count mismatch!");
> > > - if (OpOpcode == ISD::ZERO_EXTEND) // (zext (zext x)) ->
> (zext
> > x)
> > > + if (OpOpcode == ISD::ZERO_EXTEND &&
> > > + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> > > + // (zext (zext x)) -> (zext x)
> > > return getNode(ISD::ZERO_EXTEND, DL, VT,
> > > Operand.getNode()->getOperand(0));
> > > else if (OpOpcode == ISD::UNDEF)
> > > @@ -2599,8 +2602,9 @@
> > > Operand.getValueType().getVectorNumElements()) &&
> > > "Vector element count mismatch!");
> > >
> > > - if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> ISD::SIGN_EXTEND
> > ||
> > > - OpOpcode == ISD::ANY_EXTEND)
> > > + if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> > ISD::SIGN_EXTEND ||
> > > + OpOpcode == ISD::ANY_EXTEND) &&
> > > + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> > > // (ext (zext x)) -> (zext x) and (ext (sext x)) -> (sext
> x)
> > > return getNode(OpOpcode, DL, VT, Operand.getNode()-
> > >getOperand(0));
> > > else if (OpOpcode == ISD::UNDEF)
> >
> > On Oct 1, 2012, at 8:09 AM, Silviu Baranga <silbar01 at arm.com> wrote:
> >
> > > Ping.
> > >
> > >> -----Original Message-----
> > >> From: Silviu Baranga [mailto:silbar01 at arm.com]
> > >> Sent: 20 September 2012 15:48
> > >> To: 'Jim Grosbach'
> > >> Cc: llvm-commits at cs.uiuc.edu
> > >> Subject: RE: [llvm-commits] [patch] Enable the generation of
> vaddl-
> > type
> > >> instructions from extloads on ARM
> > >>
> > >> Hi Jim,
> > >>
> > >> Thanks for the comments! I am attaching a new patch with the
> > >> changes applied.
> > >>
> > >> To make the code target-independent I've added the check that
> > >> the 'middleType' is legal, which is equivalent to the previous
> > >> ARM specific check. I didn't check if the operation is legal,
> > >> since if the code got there it is marked as 'Expand' anyway.
> > >>
> > >> Cheers,
> > >> Silviu
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: Jim Grosbach [mailto:grosbach at apple.com]
> > >>> Sent: 19 September 2012 17:56
> > >>> To: Silviu Baranga
> > >>> Cc: llvm-commits at cs.uiuc.edu
> > >>> Subject: Re: [llvm-commits] [patch] Enable the generation of
> vaddl-
> > >> type
> > >>> instructions from extloads on ARM
> > >>>
> > >>> Hi Silviu,
> > >>>
> > >>> Making good progress! A few comments on the updated patch inline.
> > >>>
> > >>>> Index: test/CodeGen/ARM/vector-extend-narrow.ll
> > >>>>
> > ===================================================================
> > >>>> --- test/CodeGen/ARM/vector-extend-narrow.ll (revision 164117)
> > >>>> +++ test/CodeGen/ARM/vector-extend-narrow.ll (working copy)
> > >>>> @@ -73,3 +73,68 @@
> > >>>> ret <4 x i32> %2
> > >>>> }
> > >>>>
> > >>>> +; CHECK: k:
> > >>>> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
> > >>>> + ; CHECK: vld1.32
> > >>>> + ; CHECK: vmovl.u8
> > >>>> + ; CHECK: vaddw.u16
> > >>>> + %1 = load <4 x i8>* %in, align 4
> > >>>> + %2 = zext <4 x i8> %1 to <4 x i32>
> > >>>> + %3 = add <4 x i32> %2, %x
> > >>>> + ret <4 x i32> %3
> > >>>> +}
> > >>>> +; CHECK: l:
> > >>>> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
> > >>>> + ; CHECK: vld1.16
> > >>>> + ; CHECK: vmovl.u8
> > >>>> + ; CHECK: vmovl.u16
> > >>>> + ; CHECK: vaddw.u32
> > >>>> + %1 = load <2 x i8>* %in, align 4
> > >>>> + %2 = zext <2 x i8> %1 to <2 x i64>
> > >>>> + %3 = add <2 x i64> %2, %x
> > >>>> + ret <2 x i64> %3
> > >>>> +}
> > >>>> +
> > >>>> +; CHECK: m:
> > >>>> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
> > >>>> + ; CHECK: vld1.32
> > >>>> + ; CHECK: vmovl.u16
> > >>>> + ; CHECK: vaddw.u32
> > >>>> + %1 = load <2 x i16>* %in, align 4
> > >>>> + %2 = zext <2 x i16> %1 to <2 x i64>
> > >>>> + %3 = add <2 x i64> %2, %x
> > >>>> + ret <2 x i64> %3
> > >>>> +}
> > >>>> +
> > >>>> +; CHECK: n:
> > >>>> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
> > >>>> + ; CHECK: vld1.32
> > >>>> + ; CHECK: vmovl.s8
> > >>>> + ; CHECK: vaddw.s16
> > >>>> + %1 = load <4 x i8>* %in, align 4
> > >>>> + %2 = sext <4 x i8> %1 to <4 x i32>
> > >>>> + %3 = add <4 x i32> %2, %x
> > >>>> + ret <4 x i32> %3
> > >>>> +}
> > >>>> +; CHECK: o:
> > >>>> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
> > >>>> + ; CHECK: vld1.16
> > >>>> + ; CHECK: vmovl.s8
> > >>>> + ; CHECK: vmovl.s16
> > >>>> + ; CHECK: vaddw.s32
> > >>>> + %1 = load <2 x i8>* %in, align 4
> > >>>> + %2 = sext <2 x i8> %1 to <2 x i64>
> > >>>> + %3 = add <2 x i64> %2, %x
> > >>>> + ret <2 x i64> %3
> > >>>> +}
> > >>>> +
> > >>>> +; CHECK: p:
> > >>>> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
> > >>>> + ; CHECK: vld1.32
> > >>>> + ; CHECK: vmovl.s16
> > >>>> + ; CHECK: vaddw.s32
> > >>>> + %1 = load <2 x i16>* %in, align 4
> > >>>> + %2 = sext <2 x i16> %1 to <2 x i64>
> > >>>> + %3 = add <2 x i64> %2, %x
> > >>>> + ret <2 x i64> %3
> > >>>> +}
> > >>>> Index: lib/Target/ARM/ARMISelLowering.cpp
> > >>>>
> > ===================================================================
> > >>>> --- lib/Target/ARM/ARMISelLowering.cpp (revision 164117)
> > >>>> +++ lib/Target/ARM/ARMISelLowering.cpp (working copy)
> > >>>> @@ -565,6 +570,12 @@
> > >>>> setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
> > >>>> setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
> > >>>> }
> > >>>> +
> > >>>> + for (unsigned i = 0; i < 6; ++i) {
> > >>>> + setOperationAction(ISD::ZERO_EXTEND, Tys[i], Expand);
> > >>>> + setOperationAction(ISD::SIGN_EXTEND, Tys[i], Expand);
> > >>>> + setOperationAction(ISD::ANY_EXTEND, Tys[i], Expand);
> > >>>> + }
> > >>>> }
> > >>>
> > >>> I'd have sworn I commented about this bit before, but I can't
> find
> > it
> > >>> in my email logs.
> > >>>
> > >>> Anyways, This, and the loop above, should be moved out to a
> helper
> > >>> function and the loop removed. That is, the helper should perform
> > the
> > >>> setOperationAction() calls for a single type, then the top level
> > >>> function just calls that helper 6 times, once for each MVT. The
> > NEON
> > >>> bits for setting legal types do something similar.
> > >>>
> > >>>
> > >>>>
> > >>>> // ARM and Thumb2 support UMLAL/SMLAL.
> > >>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> > >>>>
> > ===================================================================
> > >>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp (revision 164117)
> > >>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp (working copy)
> > >>>> @@ -4342,7 +4342,7 @@
> > >>>> // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
> > >>>> // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
> > >>>> if ((ISD::isSEXTLoad(N0.getNode()) ||
> > >>> ISD::isEXTLoad(N0.getNode())) &&
> > >>>> - ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> > >>>> + ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> > >>> !VT.isVector()) {
> > >>>> LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> > >>>> EVT MemVT = LN0->getMemoryVT();
> > >>>> if ((!LegalOperations && !LN0->isVolatile()) ||
> > >>>> @@ -4678,7 +4678,7 @@
> > >>>> // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
> > >>>> // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
> > >>>> if ((ISD::isZEXTLoad(N0.getNode()) ||
> > >>> ISD::isEXTLoad(N0.getNode())) &&
> > >>>> - ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> > >>>> + ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
> > >>> !VT.isVector()) {
> > >>>> LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> > >>>> EVT MemVT = LN0->getMemoryVT();
> > >>>> if ((!LegalOperations && !LN0->isVolatile()) ||
> > >>>> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
> > >>>>
> > ===================================================================
> > >>>> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h (revision 164117)
> > >>>> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h (working copy)
> > >>>> @@ -287,6 +287,8 @@
> > >>>> SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
> > >>>> SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
> > >>>>
> > >>>> + SDValue ExpandExtend(SDNode *N);
> > >>>> +
> > >>>> void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
> > >> ISD::CondCode
> > >>> Code);
> > >>>>
> > >>>> //===---------------------------------------------------------
> --
> > >> --
> > >>> -------===//
> > >>>> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> > >>>>
> > ===================================================================
> > >>>> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> (revision
> > >>> 164117)
> > >>>> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> (working
> > >>> copy)
> > >>>> @@ -732,7 +732,44 @@
> > >>>> //===-----------------------------------------------------------
> --
> > >> --
> > >>> -------===//
> > >>>> // Integer Operand Promotion
> > >>>> //===-----------------------------------------------------------
> --
> > >> --
> > >>> -------===//
> > >>>> +SDValue DAGTypeLegalizer::ExpandExtend(SDNode *N) {
> > >>>
> > >>> This function really needs a comment explaining its purpose and
> > >> perhaps
> > >>> a different name. It's not really doing a full expand of the
> > extend,
> > >>> but rather doing some housekeeping beforehand to make sure the
> > >>> expansion that results from the Promote*() call will be
> reasonable
> > >> for
> > >>> the target. In any case, some comments about the motivation and
> > logic
> > >>> would be very helpful.
> > >>>
> > >>>> + EVT VT0 = N->getValueType(0);
> > >>>> + EVT VT1 = N->getOperand(0).getValueType();
> > >>>> + DebugLoc dl = N->getDebugLoc();
> > >>>> + // We only split the node if the operation is not legal,
> > >> otherwise
> > >>> we could get
> > >>>> + // a legalization loop.
> > >>>> + if (TLI.getOperationAction(N->getOpcode(), VT1) ==
> > >>> TargetLowering::Legal)
> > >>>> + return SDValue();
> > >>>>
> > >>>> + if (!VT0.isVector())
> > >>>> + return SDValue();
> > >>>> +
> > >>>> + // We don't want expand if the intermediate type is the
> v2i16,
> > >>>> + // since v2i16 is not a legal type. This only happens when
> the
> > >>>> + // destination has the v2i32 type and the source is v2i8.
> > >>>> + if (VT1 == MVT::v2i8 && VT0 == MVT::v2i32)
> > >>>> + return SDValue();
> > >>>
> > >>> This is in generic code now, so we can't make assumptions about
> > what
> > >>> types are legal. We need to ask the target. This function needs
> to
> > be
> > >>> made more generic, basically. For example, here we assume that
> > v2i16
> > >>> isn't legal. That's true for ARM, but quite possibly untrue for
> > other
> > >>> targets that support 32-bit vector types. Likewise, we may need
> to
> > >>> check for which operations are legal as well as which types.
> > >>>
> > >>>> + // If there is already an instruction that could be directly
> > >>>> + // selected for this there is no reason to expand since it
> will
> > >>>> + // generate a redundant expand instruction (from type X to
> type
> > >>> X).
> > >>>> + if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> > >>>> + return SDValue();
> > >>>> +
> > >>>> + LLVMContext *Context = DAG.getContext();
> > >>>> +
> > >>>> + // We need to get the intermediate type. This type will be
> > twice
> > >>> as
> > >>>> + // small as the destination type, since all expand
> instructions
> > >>> are
> > >>>> + // from a length X to a length 2*X and we have expand
> > >> instructions
> > >>>> + // for all destination types.
> > >>>> + EVT middleType = EVT::getVectorVT(*Context,
> > >>>> + EVT::getIntegerVT(*Context,
> > >>>> +
> > >> VT0.getScalarType().getSizeInBits()
> > >>> / 2),
> > >>>> + VT0.getVectorNumElements());
> > >>>> + return DAG.getNode(N->getOpcode(), dl, VT0,
> > >>>> + DAG.getNode(N->getOpcode(), dl,
> > >>>> + middleType, N-
> >getOperand(0)));
> > >>>> +}
> > >>>> +
> > >>>> /// PromoteIntegerOperand - This method is called when the
> > >> specified
> > >>> operand of
> > >>>> /// the specified node is found to need promotion. At this
> point,
> > >>> all of the
> > >>>> /// result types of the node are known to be legal, but other
> > >>> operands of the
> > >>>> @@ -752,7 +789,10 @@
> > >>>> #endif
> > >>>> llvm_unreachable("Do not know how to promote this operator's
> > >>> operand!");
> > >>>>
> > >>>> - case ISD::ANY_EXTEND: Res = PromoteIntOp_ANY_EXTEND(N);
> > break;
> > >>>> + case ISD::ANY_EXTEND:
> > >>>> + Res = ExpandExtend(N);
> > >>>> + if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
> > >>>> + break;
> > >>>> case ISD::ATOMIC_STORE:
> > >>>> Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
> > >>>> break;
> > >>>> @@ -774,15 +814,20 @@
> > >>>> case ISD::SELECT: Res = PromoteIntOp_SELECT(N, OpNo);
> > >> break;
> > >>>> case ISD::SELECT_CC: Res = PromoteIntOp_SELECT_CC(N, OpNo);
> > >>> break;
> > >>>> case ISD::SETCC: Res = PromoteIntOp_SETCC(N, OpNo);
> > >> break;
> > >>>> - case ISD::SIGN_EXTEND: Res = PromoteIntOp_SIGN_EXTEND(N);
> > >> break;
> > >>>> + case ISD::SIGN_EXTEND:
> > >>>> + Res = ExpandExtend(N);
> > >>>> + if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
> > >>>> + break;
> > >>>> case ISD::SINT_TO_FP: Res = PromoteIntOp_SINT_TO_FP(N);
> break;
> > >>>> case ISD::STORE: Res =
> > >>> PromoteIntOp_STORE(cast<StoreSDNode>(N),
> > >>>> OpNo); break;
> > >>>> case ISD::TRUNCATE: Res = PromoteIntOp_TRUNCATE(N); break;
> > >>>> case ISD::FP16_TO_FP32:
> > >>>> case ISD::UINT_TO_FP: Res = PromoteIntOp_UINT_TO_FP(N);
> break;
> > >>>> - case ISD::ZERO_EXTEND: Res = PromoteIntOp_ZERO_EXTEND(N);
> > >> break;
> > >>>> -
> > >>>> + case ISD::ZERO_EXTEND:
> > >>>> + Res = ExpandExtend(N);
> > >>>> + if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
> > >>>> + break;
> > >>>> case ISD::SHL:
> > >>>> case ISD::SRA:
> > >>>> case ISD::SRL:
> > >>>> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
> > >>>>
> > ===================================================================
> > >>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp (revision 164117)
> > >>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp (working copy)
> > >>>> @@ -2565,7 +2565,8 @@
> > >>>> VT.getVectorNumElements() ==
> > >>>> Operand.getValueType().getVectorNumElements()) &&
> > >>>> "Vector element count mismatch!");
> > >>>> - if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> > >>> ISD::ZERO_EXTEND)
> > >>>> + if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
> > >>> ISD::ZERO_EXTEND) &&
> > >>>> + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> > >>>> return getNode(OpOpcode, DL, VT, Operand.getNode()-
> > >>>> getOperand(0));
> > >>>> else if (OpOpcode == ISD::UNDEF)
> > >>>> // sext(undef) = 0, because the top bits will all be the
> > >> same.
> > >>>> @@ -2581,7 +2582,9 @@
> > >>>> VT.getVectorNumElements() ==
> > >>>> Operand.getValueType().getVectorNumElements()) &&
> > >>>> "Vector element count mismatch!");
> > >>>> - if (OpOpcode == ISD::ZERO_EXTEND) // (zext (zext x)) ->
> > >> (zext
> > >>> x)
> > >>>> + if (OpOpcode == ISD::ZERO_EXTEND &&
> > >>>> + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> > >>>> + // (zext (zext x)) -> (zext x)
> > >>>> return getNode(ISD::ZERO_EXTEND, DL, VT,
> > >>>> Operand.getNode()->getOperand(0));
> > >>>> else if (OpOpcode == ISD::UNDEF)
> > >>>> @@ -2599,8 +2602,9 @@
> > >>>> Operand.getValueType().getVectorNumElements()) &&
> > >>>> "Vector element count mismatch!");
> > >>>>
> > >>>> - if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> > >> ISD::SIGN_EXTEND
> > >>> ||
> > >>>> - OpOpcode == ISD::ANY_EXTEND)
> > >>>> + if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
> > >>> ISD::SIGN_EXTEND ||
> > >>>> + OpOpcode == ISD::ANY_EXTEND) &&
> > >>>> + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> > >>>> // (ext (zext x)) -> (zext x) and (ext (sext x)) ->
> (sext
> > >> x)
> > >>>> return getNode(OpOpcode, DL, VT, Operand.getNode()-
> > >>>> getOperand(0));
> > >>>> else if (OpOpcode == ISD::UNDEF)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Sep 19, 2012, at 3:47 AM, Silviu Baranga <silbar01 at arm.com>
> > wrote:
> > >>>
> > >>>> Hi Jim,
> > >>>>
> > >>>> Thanks for the review. It does make sense to do this in a target
> > >>>> independent way. I've attached a new patch that does this.
> > >>>>
> > >>>> I've moved the code ARM-specific code into the legalizer and
> also
> > >>>> added tests for sext nodes. It appears that I was mistaken
> before
> > >>>> and (sext (sextload)) patterns were not prevented from folding
> for
> > >>>> vector types. This was stopping the generation of vaddl-type
> > >>>> instructions. I think that stopping the folding of (sext
> > >> (sextload))
> > >>>> patterns is still OK, because the merging would result in a
> > illegal
> > >>>> operation.
> > >>>>
> > >>>> Cheers,
> > >>>> Silviu
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
> > >>>>> Sent: 17 September 2012 18:09
> > >>>>> To: Silviu Baranga
> > >>>>> Cc: llvm-commits at cs.uiuc.edu
> > >>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
> > >> vaddl-
> > >>> type
> > >>>>> instructions from extloads on ARM
> > >>>>>
> > >>>>> Hi Silviu,
> > >>>>>
> > >>>>> Thanks for the update. This is definitely much better. Why do
> we
> > >>> need
> > >>>>> any ARM specific code here, though? Given the problem
> > description,
> > >>> it
> > >>>>> seems like this is something the legalizer should be able to
> > >> reason
> > >>>>> about without needing custom handling in the target.
> > >>>>>
> > >>>>> -Jim
> > >>>>>
> > >>>>>
> > >>>>> On Sep 6, 2012, at 9:34 AM, Silviu Baranga <silbar01 at arm.com>
> > >> wrote:
> > >>>>>
> > >>>>>> Hi Jim,
> > >>>>>>
> > >>>>>> Thanks for the review. The BREAK node solution was a bit ugly.
> > >>>>>>
> > >>>>>> I've attached a new patch that solves this without a BREAK
> node.
> > >>>>>> This is done by stopping the folding of nodes zext nodes when
> > >>>>>> this will result in a non-legal operation.
> > >>>>>>
> > >>>>>> I also had to stop the (zext (zextload x)) folding for vector
> > >>>>>> types. This is already being done for sext nodes.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Silviu
> > >>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
> > >>>>>>> Sent: 06 September 2012 01:42
> > >>>>>>> To: Silviu Baranga
> > >>>>>>> Cc: llvm-commits at cs.uiuc.edu
> > >>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
> > >>> vaddl-
> > >>>>> type
> > >>>>>>> instructions from extloads on ARM
> > >>>>>>>
> > >>>>>>> Hi Silviu,
> > >>>>>>>
> > >>>>>>> I like the idea of teaching the compiler to use these
> > >>> instructions,
> > >>>>> but
> > >>>>>>> the BREAK pseudo-instruction feels pretty gross to me. I'd
> much
> > >>>>> prefer
> > >>>>>>> we figure out how to teach the combiner and isel to be
> smarter
> > >>>>> directly
> > >>>>>>> rather than doing something like that.
> > >>>>>>>
> > >>>>>>> -Jim
> > >>>>>>>
> > >>>>>>> On Sep 4, 2012, at 1:58 AM, Silviu Baranga <silbar01 at arm.com>
> > >>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> Currently LLVM will not generate a vaddl-type instructions
> > when
> > >>> an
> > >>>>>>> extload
> > >>>>>>>> node is in the input dag. All vector extload nodes are
> > >>> implemented
> > >>>>>>> with
> > >>>>>>>> a load instruction and a lengthening instruction on ARM. If
> > the
> > >>>>>>> lengthening
> > >>>>>>>> instruction is followed by an add instruction, it would be
> > >> better
> > >>>>> to
> > >>>>>>>> generate
> > >>>>>>>> a vaddl-type instruction instead.
> > >>>>>>>>
> > >>>>>>>> To do this, we split all lengthening dag nodes which extend
> a
> > >>> type
> > >>>>> to
> > >>>>>>> more
> > >>>>>>>> then twice its size (this is OK because these nodes would be
> > >>>>>>> implemented by
> > >>>>>>>> multiple instructions anyway) into two lengthening nodes.
> One
> > >>> node
> > >>>>>>> will be
> > >>>>>>>> later
> > >>>>>>>> combined to form the extload node, and the other can be
> > >> selected
> > >>> as
> > >>>>>>> part of
> > >>>>>>>> a
> > >>>>>>>> vaddl-type instruction (there are already patterns for
> this).
> > >>>>>>>>
> > >>>>>>>> To stop the dag nodes from re-combining, we insert a BREAK
> > node
> > >>>>>>> between
> > >>>>>>>> them. The
> > >>>>>>>> BREAK node is used as an optimization barrier for the DAG
> > >>> combiner,
> > >>>>>>> and
> > >>>>>>>> might be
> > >>>>>>>> useful in other cases as well. We remove the BREAK node in
> the
> > >>>>>>> instruction
> > >>>>>>>> selection stage.
> > >>>>>>>>
> > >>>>>>>> This approach allows the reuse of existing selection
> patterns.
> > >>>>>>>>
> > >>>>>>>> This patch depends on the following other patch:
> > >>>>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-
> > >>>>>>> 20120827/149201.
> > >>>>>>>> html
> > >>>>>>>>
> > >>>>>>>> Please review!
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>>
> > >> Silviu<patch.diff>_______________________________________________
> > >>>>>>>> llvm-commits mailing list
> > >>>>>>>> llvm-commits at cs.uiuc.edu
> > >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >>>>>>>
> > >>>>>>>
> > >>>>>> <zext_split.diff>
> > >>>>>
> > >>>>>
> > >>>> <zext_split.diff>
> > >>>
> > >>>
> > >
> > >
> > >
> > >
> >
> >
More information about the llvm-commits
mailing list