[llvm-commits] [patch] Enable the generation of vaddl-type instructions from extloads on ARM
Jim Grosbach
grosbach at apple.com
Tue Oct 2 13:54:17 PDT 2012
Hi Silviu,
Specific comments below.
Evan, you're more expert than I in this part of the code. There's something about this whole thing that feels a bit funny, but I can't put my finger on it effectively enough to be very helpful. Mind having a look? It'll probably be obvious one way or the other to you.
> Index: test/CodeGen/ARM/vector-extend-narrow.ll
General comment: thank you for making sure to include test cases. It's amazing how many patches don't do that...
> ===================================================================
> --- test/CodeGen/ARM/vector-extend-narrow.ll (revision 164117)
> +++ test/CodeGen/ARM/vector-extend-narrow.ll (working copy)
> @@ -73,3 +73,68 @@
> ret <4 x i32> %2
> }
>
> +; CHECK: k:
> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
> + ; CHECK: vld1.32
> + ; CHECK: vmovl.u8
> + ; CHECK: vaddw.u16
> + %1 = load <4 x i8>* %in, align 4
> + %2 = zext <4 x i8> %1 to <4 x i32>
> + %3 = add <4 x i32> %2, %x
> + ret <4 x i32> %3
> +}
> +; CHECK: l:
> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
> + ; CHECK: vld1.16
> + ; CHECK: vmovl.u8
> + ; CHECK: vmovl.u16
> + ; CHECK: vaddw.u32
> + %1 = load <2 x i8>* %in, align 4
> + %2 = zext <2 x i8> %1 to <2 x i64>
> + %3 = add <2 x i64> %2, %x
> + ret <2 x i64> %3
> +}
> +
> +; CHECK: m:
> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
> + ; CHECK: vld1.32
> + ; CHECK: vmovl.u16
> + ; CHECK: vaddw.u32
> + %1 = load <2 x i16>* %in, align 4
> + %2 = zext <2 x i16> %1 to <2 x i64>
> + %3 = add <2 x i64> %2, %x
> + ret <2 x i64> %3
> +}
> +
> +; CHECK: n:
> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
> + ; CHECK: vld1.32
> + ; CHECK: vmovl.s8
> + ; CHECK: vaddw.s16
> + %1 = load <4 x i8>* %in, align 4
> + %2 = sext <4 x i8> %1 to <4 x i32>
> + %3 = add <4 x i32> %2, %x
> + ret <4 x i32> %3
> +}
> +; CHECK: o:
> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
> + ; CHECK: vld1.16
> + ; CHECK: vmovl.s8
> + ; CHECK: vmovl.s16
> + ; CHECK: vaddw.s32
> + %1 = load <2 x i8>* %in, align 4
> + %2 = sext <2 x i8> %1 to <2 x i64>
> + %3 = add <2 x i64> %2, %x
> + ret <2 x i64> %3
> +}
> +
> +; CHECK: p:
> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
> + ; CHECK: vld1.32
> + ; CHECK: vmovl.s16
> + ; CHECK: vaddw.s32
> + %1 = load <2 x i16>* %in, align 4
> + %2 = sext <2 x i16> %1 to <2 x i64>
> + %3 = add <2 x i64> %2, %x
> + ret <2 x i64> %3
> +}
> Index: lib/Target/ARM/ARMISelLowering.h
> ===================================================================
> --- lib/Target/ARM/ARMISelLowering.h (revision 164117)
> +++ lib/Target/ARM/ARMISelLowering.h (working copy)
> @@ -403,6 +406,8 @@
> void addDRTypeForNEON(MVT VT);
> void addQRTypeForNEON(MVT VT);
>
> + void addExtendLegalizeTypeForNEON(MVT VT);
> +
> typedef SmallVector<std::pair<unsigned, SDValue>, 8> RegsToPassVector;
> void PassF64ArgInRegs(DebugLoc dl, SelectionDAG &DAG,
> SDValue Chain, SDValue &Arg,
> Index: lib/Target/ARM/ARMISelLowering.cpp
> ===================================================================
> --- lib/Target/ARM/ARMISelLowering.cpp (revision 164117)
> +++ lib/Target/ARM/ARMISelLowering.cpp (working copy)
> @@ -158,6 +158,15 @@
> addTypeForNEON(VT, MVT::v2f64, MVT::v4i32);
> }
>
> +void ARMTargetLowering::addExtendLegalizeTypeForNEON(MVT VT) {
> + setLoadExtAction(ISD::EXTLOAD, VT, Legal);
> + setLoadExtAction(ISD::SEXTLOAD, VT, Legal);
> + setLoadExtAction(ISD::ZEXTLOAD, VT, Legal);
> + setOperationAction(ISD::ANY_EXTEND, VT, Expand);
> + setOperationAction(ISD::SIGN_EXTEND, VT, Expand);
> + setOperationAction(ISD::ZERO_EXTEND, VT, Expand);
> +}
Formatting. Only two space indentation required.
> +
> static TargetLoweringObjectFile *createTLOF(TargetMachine &TM) {
> if (TM.getSubtarget<ARMSubtarget>().isTargetDarwin())
> return new TargetLoweringObjectFileMachO();
> @@ -556,15 +565,17 @@
> setTargetDAGCombine(ISD::FP_TO_UINT);
> setTargetDAGCombine(ISD::FDIV);
>
> - // It is legal to extload from v4i8 to v4i16 or v4i32.
> - MVT Tys[6] = {MVT::v8i8, MVT::v4i8, MVT::v2i8,
> - MVT::v4i16, MVT::v2i16,
> - MVT::v2i32};
> - for (unsigned i = 0; i < 6; ++i) {
> - setLoadExtAction(ISD::EXTLOAD, Tys[i], Legal);
> - setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
> - setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
> - }
> + // There is no instruction to expand v4i8 to v4i32 and
> + // we must do this with two vmovl instructions. However,
> + // the vmovl instructions can be combined with other
> + // lengthening instructions, so we would prefer to lower
> + // (zextload x) -> (zext (zextload x).
> + addExtendLegalizeTypeForNEON(MVT::v8i8);
> + addExtendLegalizeTypeForNEON(MVT::v4i8);
> + addExtendLegalizeTypeForNEON(MVT::v2i8);
> + addExtendLegalizeTypeForNEON(MVT::v4i16);
> + addExtendLegalizeTypeForNEON(MVT::v2i16);
> + addExtendLegalizeTypeForNEON(MVT::v2i32);
> }
>
> // ARM and Thumb2 support UMLAL/SMLAL.
> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> ===================================================================
> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp (revision 164117)
> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp (working copy)
> @@ -4342,7 +4342,7 @@
> // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
> // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
> if ((ISD::isSEXTLoad(N0.getNode()) || ISD::isEXTLoad(N0.getNode())) &&
> - ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> + ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() && !VT.isVector()) {
> LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> EVT MemVT = LN0->getMemoryVT();
> if ((!LegalOperations && !LN0->isVolatile()) ||
> @@ -4678,7 +4678,7 @@
> // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
> // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
> if ((ISD::isZEXTLoad(N0.getNode()) || ISD::isEXTLoad(N0.getNode())) &&
> - ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
> + ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() && !VT.isVector()) {
> LoadSDNode *LN0 = cast<LoadSDNode>(N0);
> EVT MemVT = LN0->getMemoryVT();
> if ((!LegalOperations && !LN0->isVolatile()) ||
> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
> ===================================================================
> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h (revision 164117)
> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h (working copy)
> @@ -287,6 +287,8 @@
> SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
> SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
>
> + SDValue LowerExtendOp(SDNode *N);
> +
> void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);
>
> //===--------------------------------------------------------------------===//
> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
> ===================================================================
> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp (revision 164117)
> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp (working copy)
> @@ -733,6 +733,51 @@
> // Integer Operand Promotion
> //===----------------------------------------------------------------------===//
>
> +// LowerExtendOp - This function breaks up vector zext/sext nodes so that the
> +// result will more closely resemble the operations available on the target.
> +// The resulting nodes can be each combined with other nodes to create legal
> +// operations.
Some addition questions below, but in general, this could probably use a bit more description in the comment about what's actually happening here. Why this transform is a good idea and exactly what the transform is, in particular.
> +SDValue DAGTypeLegalizer::LowerExtendOp(SDNode *N) {
Is "Lower" really the right name for this function?
> + EVT VT0 = N->getValueType(0);
> + EVT VT1 = N->getOperand(0).getValueType();
> + DebugLoc dl = N->getDebugLoc();
> + // We only split the node if the operation is not legal, otherwise we could get
> + // a legalization loop.
> + if (TLI.getOperationAction(N->getOpcode(), VT1) == TargetLowering::Legal)
You probably just want TLI.isOperationLegalOrCustom() here.
> + return SDValue();
> +
> + if (!VT0.isVector())
> + return SDValue();
Minor note, but I'd put this check first as it's cheaper and more likely to fire than the check for the operation being legal.
> +
> + // If there is already an instruction that could be directly
> + // selected for this there is no reason to expand since it will
> + // generate a redundant expand instruction (from type X to type X).
> + if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
> + return SDValue();
I don't follow this. It seems pretty arbitrary. The comment doesn't help much, since how do we know what instructions could be selected or not? Can you clarify?
> +
> + LLVMContext *Context = DAG.getContext();
Can just have this be a reference since that's how it's used below and we never can have a NULL context here.
> +
> + // We need to get the intermediate type. This type will be twice as
> + // small as the destination type,
s/twice as small as/half the size of/
> since all expand instructions are
> + // from a length X to a length 2*X and we have expand instructions
> + // for all destination types.
Not sure I understand what you mean by "we have expand instructions for all destination types." Can you clarify?
> + EVT middleType = EVT::getVectorVT(*Context,
Variable names should start with a Capitol letter. It's just function names that are camel-case starting w/ lower case. That said, it's unclear what "middle" means here. Perhaps a clearer name or a comment?
> + EVT::getIntegerVT(*Context,
> + VT0.getScalarType().getSizeInBits() / 2),
> + VT0.getVectorNumElements());
> +
> + // We don't want to expand if we can't use a legal intermediate type
> + // since there wouldn't be any benefit from this and we might end
> + // up with nodes that cannot be selected in the instruction selection
> + // stage.
> + if (!TLI.isTypeLegal(middleType))
> + return SDValue();
> +
> + return DAG.getNode(N->getOpcode(), dl, VT0,
> + DAG.getNode(N->getOpcode(), dl,
> + middleType, N->getOperand(0)));
> +}
> +
> /// PromoteIntegerOperand - This method is called when the specified operand of
> /// the specified node is found to need promotion. At this point, all of the
> /// result types of the node are known to be legal, but other operands of the
> @@ -752,7 +797,10 @@
> #endif
> llvm_unreachable("Do not know how to promote this operator's operand!");
>
> - case ISD::ANY_EXTEND: Res = PromoteIntOp_ANY_EXTEND(N); break;
> + case ISD::ANY_EXTEND:
> + Res = LowerExtendOp(N);
> + if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
> + break;
> case ISD::ATOMIC_STORE:
> Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
> break;
> @@ -774,15 +822,20 @@
> case ISD::SELECT: Res = PromoteIntOp_SELECT(N, OpNo); break;
> case ISD::SELECT_CC: Res = PromoteIntOp_SELECT_CC(N, OpNo); break;
> case ISD::SETCC: Res = PromoteIntOp_SETCC(N, OpNo); break;
> - case ISD::SIGN_EXTEND: Res = PromoteIntOp_SIGN_EXTEND(N); break;
> + case ISD::SIGN_EXTEND:
> + Res = LowerExtendOp(N);
> + if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
> + break;
> case ISD::SINT_TO_FP: Res = PromoteIntOp_SINT_TO_FP(N); break;
> case ISD::STORE: Res = PromoteIntOp_STORE(cast<StoreSDNode>(N),
> OpNo); break;
> case ISD::TRUNCATE: Res = PromoteIntOp_TRUNCATE(N); break;
> case ISD::FP16_TO_FP32:
> case ISD::UINT_TO_FP: Res = PromoteIntOp_UINT_TO_FP(N); break;
> - case ISD::ZERO_EXTEND: Res = PromoteIntOp_ZERO_EXTEND(N); break;
> -
> + case ISD::ZERO_EXTEND:
> + Res = LowerExtendOp(N);
> + if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
> + break;
> case ISD::SHL:
> case ISD::SRA:
> case ISD::SRL:
> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
> ===================================================================
> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp (revision 164117)
> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp (working copy)
> @@ -2565,7 +2565,8 @@
> VT.getVectorNumElements() ==
> Operand.getValueType().getVectorNumElements()) &&
> "Vector element count mismatch!");
> - if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode == ISD::ZERO_EXTEND)
> + if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode == ISD::ZERO_EXTEND) &&
> + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> return getNode(OpOpcode, DL, VT, Operand.getNode()->getOperand(0));
> else if (OpOpcode == ISD::UNDEF)
> // sext(undef) = 0, because the top bits will all be the same.
> @@ -2581,7 +2582,9 @@
> VT.getVectorNumElements() ==
> Operand.getValueType().getVectorNumElements()) &&
> "Vector element count mismatch!");
> - if (OpOpcode == ISD::ZERO_EXTEND) // (zext (zext x)) -> (zext x)
> + if (OpOpcode == ISD::ZERO_EXTEND &&
> + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> + // (zext (zext x)) -> (zext x)
> return getNode(ISD::ZERO_EXTEND, DL, VT,
> Operand.getNode()->getOperand(0));
> else if (OpOpcode == ISD::UNDEF)
> @@ -2599,8 +2602,9 @@
> Operand.getValueType().getVectorNumElements()) &&
> "Vector element count mismatch!");
>
> - if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode == ISD::SIGN_EXTEND ||
> - OpOpcode == ISD::ANY_EXTEND)
> + if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode == ISD::SIGN_EXTEND ||
> + OpOpcode == ISD::ANY_EXTEND) &&
> + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
> // (ext (zext x)) -> (zext x) and (ext (sext x)) -> (sext x)
> return getNode(OpOpcode, DL, VT, Operand.getNode()->getOperand(0));
> else if (OpOpcode == ISD::UNDEF)
On Oct 1, 2012, at 8:09 AM, Silviu Baranga <silbar01 at arm.com> wrote:
> Ping.
>
>> -----Original Message-----
>> From: Silviu Baranga [mailto:silbar01 at arm.com]
>> Sent: 20 September 2012 15:48
>> To: 'Jim Grosbach'
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: RE: [llvm-commits] [patch] Enable the generation of vaddl-type
>> instructions from extloads on ARM
>>
>> Hi Jim,
>>
>> Thanks for the comments! I am attaching a new patch with the
>> changes applied.
>>
>> To make the code target-independent I've added the check that
>> the 'middleType' is legal, which is equivalent to the previous
>> ARM specific check. I didn't check if the operation is legal,
>> since if the code got there it is marked as 'Expand' anyway.
>>
>> Cheers,
>> Silviu
>>
>>
>>> -----Original Message-----
>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>> Sent: 19 September 2012 17:56
>>> To: Silviu Baranga
>>> Cc: llvm-commits at cs.uiuc.edu
>>> Subject: Re: [llvm-commits] [patch] Enable the generation of vaddl-
>> type
>>> instructions from extloads on ARM
>>>
>>> Hi Silviu,
>>>
>>> Making good progress! A few comments on the updated patch inline.
>>>
>>>> Index: test/CodeGen/ARM/vector-extend-narrow.ll
>>>> ===================================================================
>>>> --- test/CodeGen/ARM/vector-extend-narrow.ll (revision 164117)
>>>> +++ test/CodeGen/ARM/vector-extend-narrow.ll (working copy)
>>>> @@ -73,3 +73,68 @@
>>>> ret <4 x i32> %2
>>>> }
>>>>
>>>> +; CHECK: k:
>>>> +define <4 x i32> @k(<4 x i8>* %in, <4 x i32> %x) nounwind {
>>>> + ; CHECK: vld1.32
>>>> + ; CHECK: vmovl.u8
>>>> + ; CHECK: vaddw.u16
>>>> + %1 = load <4 x i8>* %in, align 4
>>>> + %2 = zext <4 x i8> %1 to <4 x i32>
>>>> + %3 = add <4 x i32> %2, %x
>>>> + ret <4 x i32> %3
>>>> +}
>>>> +; CHECK: l:
>>>> +define <2 x i64> @l(<2 x i8>* %in, <2 x i64> %x) nounwind {
>>>> + ; CHECK: vld1.16
>>>> + ; CHECK: vmovl.u8
>>>> + ; CHECK: vmovl.u16
>>>> + ; CHECK: vaddw.u32
>>>> + %1 = load <2 x i8>* %in, align 4
>>>> + %2 = zext <2 x i8> %1 to <2 x i64>
>>>> + %3 = add <2 x i64> %2, %x
>>>> + ret <2 x i64> %3
>>>> +}
>>>> +
>>>> +; CHECK: m:
>>>> +define <2 x i64> @m(<2 x i16>* %in, <2 x i64> %x) nounwind {
>>>> + ; CHECK: vld1.32
>>>> + ; CHECK: vmovl.u16
>>>> + ; CHECK: vaddw.u32
>>>> + %1 = load <2 x i16>* %in, align 4
>>>> + %2 = zext <2 x i16> %1 to <2 x i64>
>>>> + %3 = add <2 x i64> %2, %x
>>>> + ret <2 x i64> %3
>>>> +}
>>>> +
>>>> +; CHECK: n:
>>>> +define <4 x i32> @n(<4 x i8>* %in, <4 x i32> %x) nounwind {
>>>> + ; CHECK: vld1.32
>>>> + ; CHECK: vmovl.s8
>>>> + ; CHECK: vaddw.s16
>>>> + %1 = load <4 x i8>* %in, align 4
>>>> + %2 = sext <4 x i8> %1 to <4 x i32>
>>>> + %3 = add <4 x i32> %2, %x
>>>> + ret <4 x i32> %3
>>>> +}
>>>> +; CHECK: o:
>>>> +define <2 x i64> @o(<2 x i8>* %in, <2 x i64> %x) nounwind {
>>>> + ; CHECK: vld1.16
>>>> + ; CHECK: vmovl.s8
>>>> + ; CHECK: vmovl.s16
>>>> + ; CHECK: vaddw.s32
>>>> + %1 = load <2 x i8>* %in, align 4
>>>> + %2 = sext <2 x i8> %1 to <2 x i64>
>>>> + %3 = add <2 x i64> %2, %x
>>>> + ret <2 x i64> %3
>>>> +}
>>>> +
>>>> +; CHECK: p:
>>>> +define <2 x i64> @p(<2 x i16>* %in, <2 x i64> %x) nounwind {
>>>> + ; CHECK: vld1.32
>>>> + ; CHECK: vmovl.s16
>>>> + ; CHECK: vaddw.s32
>>>> + %1 = load <2 x i16>* %in, align 4
>>>> + %2 = sext <2 x i16> %1 to <2 x i64>
>>>> + %3 = add <2 x i64> %2, %x
>>>> + ret <2 x i64> %3
>>>> +}
>>>> Index: lib/Target/ARM/ARMISelLowering.cpp
>>>> ===================================================================
>>>> --- lib/Target/ARM/ARMISelLowering.cpp (revision 164117)
>>>> +++ lib/Target/ARM/ARMISelLowering.cpp (working copy)
>>>> @@ -565,6 +570,12 @@
>>>> setLoadExtAction(ISD::ZEXTLOAD, Tys[i], Legal);
>>>> setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
>>>> }
>>>> +
>>>> + for (unsigned i = 0; i < 6; ++i) {
>>>> + setOperationAction(ISD::ZERO_EXTEND, Tys[i], Expand);
>>>> + setOperationAction(ISD::SIGN_EXTEND, Tys[i], Expand);
>>>> + setOperationAction(ISD::ANY_EXTEND, Tys[i], Expand);
>>>> + }
>>>> }
>>>
>>> I'd have sworn I commented about this bit before, but I can't find it
>>> in my email logs.
>>>
>>> Anyways, This, and the loop above, should be moved out to a helper
>>> function and the loop removed. That is, the helper should perform the
>>> setOperationAction() calls for a single type, then the top level
>>> function just calls that helper 6 times, once for each MVT. The NEON
>>> bits for setting legal types do something similar.
>>>
>>>
>>>>
>>>> // ARM and Thumb2 support UMLAL/SMLAL.
>>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>> ===================================================================
>>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp (revision 164117)
>>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp (working copy)
>>>> @@ -4342,7 +4342,7 @@
>>>> // fold (sext (sextload x)) -> (sext (truncate (sextload x)))
>>>> // fold (sext ( extload x)) -> (sext (truncate (sextload x)))
>>>> if ((ISD::isSEXTLoad(N0.getNode()) ||
>>> ISD::isEXTLoad(N0.getNode())) &&
>>>> - ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
>>>> + ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
>>> !VT.isVector()) {
>>>> LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>>>> EVT MemVT = LN0->getMemoryVT();
>>>> if ((!LegalOperations && !LN0->isVolatile()) ||
>>>> @@ -4678,7 +4678,7 @@
>>>> // fold (zext (zextload x)) -> (zext (truncate (zextload x)))
>>>> // fold (zext ( extload x)) -> (zext (truncate (zextload x)))
>>>> if ((ISD::isZEXTLoad(N0.getNode()) ||
>>> ISD::isEXTLoad(N0.getNode())) &&
>>>> - ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse()) {
>>>> + ISD::isUNINDEXEDLoad(N0.getNode()) && N0.hasOneUse() &&
>>> !VT.isVector()) {
>>>> LoadSDNode *LN0 = cast<LoadSDNode>(N0);
>>>> EVT MemVT = LN0->getMemoryVT();
>>>> if ((!LegalOperations && !LN0->isVolatile()) ||
>>>> Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h
>>>> ===================================================================
>>>> --- lib/CodeGen/SelectionDAG/LegalizeTypes.h (revision 164117)
>>>> +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h (working copy)
>>>> @@ -287,6 +287,8 @@
>>>> SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
>>>> SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
>>>>
>>>> + SDValue ExpandExtend(SDNode *N);
>>>> +
>>>> void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS,
>> ISD::CondCode
>>> Code);
>>>>
>>>> //===-----------------------------------------------------------
>> --
>>> -------===//
>>>> Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
>>>> ===================================================================
>>>> --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp (revision
>>> 164117)
>>>> +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp (working
>>> copy)
>>>> @@ -732,7 +732,44 @@
>>>> //===-------------------------------------------------------------
>> --
>>> -------===//
>>>> // Integer Operand Promotion
>>>> //===-------------------------------------------------------------
>> --
>>> -------===//
>>>> +SDValue DAGTypeLegalizer::ExpandExtend(SDNode *N) {
>>>
>>> This function really needs a comment explaining its purpose and
>> perhaps
>>> a different name. It's not really doing a full expand of the extend,
>>> but rather doing some housekeeping beforehand to make sure the
>>> expansion that results from the Promote*() call will be reasonable
>> for
>>> the target. In any case, some comments about the motivation and logic
>>> would be very helpful.
>>>
>>>> + EVT VT0 = N->getValueType(0);
>>>> + EVT VT1 = N->getOperand(0).getValueType();
>>>> + DebugLoc dl = N->getDebugLoc();
>>>> + // We only split the node if the operation is not legal,
>> otherwise
>>> we could get
>>>> + // a legalization loop.
>>>> + if (TLI.getOperationAction(N->getOpcode(), VT1) ==
>>> TargetLowering::Legal)
>>>> + return SDValue();
>>>>
>>>> + if (!VT0.isVector())
>>>> + return SDValue();
>>>> +
>>>> + // We don't want expand if the intermediate type is the v2i16,
>>>> + // since v2i16 is not a legal type. This only happens when the
>>>> + // destination has the v2i32 type and the source is v2i8.
>>>> + if (VT1 == MVT::v2i8 && VT0 == MVT::v2i32)
>>>> + return SDValue();
>>>
>>> This is in generic code now, so we can't make assumptions about what
>>> types are legal. We need to ask the target. This function needs to be
>>> made more generic, basically. For example, here we assume that v2i16
>>> isn't legal. That's true for ARM, but quite possibly untrue for other
>>> targets that support 32-bit vector types. Likewise, we may need to
>>> check for which operations are legal as well as which types.
>>>
>>>> + // If there is already an instruction that could be directly
>>>> + // selected for this there is no reason to expand since it will
>>>> + // generate a redundant expand instruction (from type X to type
>>> X).
>>>> + if ((VT1.getSizeInBits() * 2) >= VT0.getSizeInBits())
>>>> + return SDValue();
>>>> +
>>>> + LLVMContext *Context = DAG.getContext();
>>>> +
>>>> + // We need to get the intermediate type. This type will be twice
>>> as
>>>> + // small as the destination type, since all expand instructions
>>> are
>>>> + // from a length X to a length 2*X and we have expand
>> instructions
>>>> + // for all destination types.
>>>> + EVT middleType = EVT::getVectorVT(*Context,
>>>> + EVT::getIntegerVT(*Context,
>>>> +
>> VT0.getScalarType().getSizeInBits()
>>> / 2),
>>>> + VT0.getVectorNumElements());
>>>> + return DAG.getNode(N->getOpcode(), dl, VT0,
>>>> + DAG.getNode(N->getOpcode(), dl,
>>>> + middleType, N->getOperand(0)));
>>>> +}
>>>> +
>>>> /// PromoteIntegerOperand - This method is called when the
>> specified
>>> operand of
>>>> /// the specified node is found to need promotion. At this point,
>>> all of the
>>>> /// result types of the node are known to be legal, but other
>>> operands of the
>>>> @@ -752,7 +789,10 @@
>>>> #endif
>>>> llvm_unreachable("Do not know how to promote this operator's
>>> operand!");
>>>>
>>>> - case ISD::ANY_EXTEND: Res = PromoteIntOp_ANY_EXTEND(N); break;
>>>> + case ISD::ANY_EXTEND:
>>>> + Res = ExpandExtend(N);
>>>> + if (!Res.getNode()) Res = PromoteIntOp_ANY_EXTEND(N);
>>>> + break;
>>>> case ISD::ATOMIC_STORE:
>>>> Res = PromoteIntOp_ATOMIC_STORE(cast<AtomicSDNode>(N));
>>>> break;
>>>> @@ -774,15 +814,20 @@
>>>> case ISD::SELECT: Res = PromoteIntOp_SELECT(N, OpNo);
>> break;
>>>> case ISD::SELECT_CC: Res = PromoteIntOp_SELECT_CC(N, OpNo);
>>> break;
>>>> case ISD::SETCC: Res = PromoteIntOp_SETCC(N, OpNo);
>> break;
>>>> - case ISD::SIGN_EXTEND: Res = PromoteIntOp_SIGN_EXTEND(N);
>> break;
>>>> + case ISD::SIGN_EXTEND:
>>>> + Res = ExpandExtend(N);
>>>> + if (!Res.getNode()) Res = PromoteIntOp_SIGN_EXTEND(N);
>>>> + break;
>>>> case ISD::SINT_TO_FP: Res = PromoteIntOp_SINT_TO_FP(N); break;
>>>> case ISD::STORE: Res =
>>> PromoteIntOp_STORE(cast<StoreSDNode>(N),
>>>> OpNo); break;
>>>> case ISD::TRUNCATE: Res = PromoteIntOp_TRUNCATE(N); break;
>>>> case ISD::FP16_TO_FP32:
>>>> case ISD::UINT_TO_FP: Res = PromoteIntOp_UINT_TO_FP(N); break;
>>>> - case ISD::ZERO_EXTEND: Res = PromoteIntOp_ZERO_EXTEND(N);
>> break;
>>>> -
>>>> + case ISD::ZERO_EXTEND:
>>>> + Res = ExpandExtend(N);
>>>> + if (!Res.getNode()) Res = PromoteIntOp_ZERO_EXTEND(N);
>>>> + break;
>>>> case ISD::SHL:
>>>> case ISD::SRA:
>>>> case ISD::SRL:
>>>> Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
>>>> ===================================================================
>>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp (revision 164117)
>>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp (working copy)
>>>> @@ -2565,7 +2565,8 @@
>>>> VT.getVectorNumElements() ==
>>>> Operand.getValueType().getVectorNumElements()) &&
>>>> "Vector element count mismatch!");
>>>> - if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
>>> ISD::ZERO_EXTEND)
>>>> + if ((OpOpcode == ISD::SIGN_EXTEND || OpOpcode ==
>>> ISD::ZERO_EXTEND) &&
>>>> + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>> return getNode(OpOpcode, DL, VT, Operand.getNode()-
>>>> getOperand(0));
>>>> else if (OpOpcode == ISD::UNDEF)
>>>> // sext(undef) = 0, because the top bits will all be the
>> same.
>>>> @@ -2581,7 +2582,9 @@
>>>> VT.getVectorNumElements() ==
>>>> Operand.getValueType().getVectorNumElements()) &&
>>>> "Vector element count mismatch!");
>>>> - if (OpOpcode == ISD::ZERO_EXTEND) // (zext (zext x)) ->
>> (zext
>>> x)
>>>> + if (OpOpcode == ISD::ZERO_EXTEND &&
>>>> + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>> + // (zext (zext x)) -> (zext x)
>>>> return getNode(ISD::ZERO_EXTEND, DL, VT,
>>>> Operand.getNode()->getOperand(0));
>>>> else if (OpOpcode == ISD::UNDEF)
>>>> @@ -2599,8 +2602,9 @@
>>>> Operand.getValueType().getVectorNumElements()) &&
>>>> "Vector element count mismatch!");
>>>>
>>>> - if (OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
>> ISD::SIGN_EXTEND
>>> ||
>>>> - OpOpcode == ISD::ANY_EXTEND)
>>>> + if ((OpOpcode == ISD::ZERO_EXTEND || OpOpcode ==
>>> ISD::SIGN_EXTEND ||
>>>> + OpOpcode == ISD::ANY_EXTEND) &&
>>>> + TLI.isOperationLegal(OpOpcode, Operand.getValueType()))
>>>> // (ext (zext x)) -> (zext x) and (ext (sext x)) -> (sext
>> x)
>>>> return getNode(OpOpcode, DL, VT, Operand.getNode()-
>>>> getOperand(0));
>>>> else if (OpOpcode == ISD::UNDEF)
>>>
>>>
>>>
>>>
>>> On Sep 19, 2012, at 3:47 AM, Silviu Baranga <silbar01 at arm.com> wrote:
>>>
>>>> Hi Jim,
>>>>
>>>> Thanks for the review. It does make sense to do this in a target
>>>> independent way. I've attached a new patch that does this.
>>>>
>>>> I've moved the code ARM-specific code into the legalizer and also
>>>> added tests for sext nodes. It appears that I was mistaken before
>>>> and (sext (sextload)) patterns were not prevented from folding for
>>>> vector types. This was stopping the generation of vaddl-type
>>>> instructions. I think that stopping the folding of (sext
>> (sextload))
>>>> patterns is still OK, because the merging would result in a illegal
>>>> operation.
>>>>
>>>> Cheers,
>>>> Silviu
>>>>
>>>>> -----Original Message-----
>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>>>> Sent: 17 September 2012 18:09
>>>>> To: Silviu Baranga
>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
>> vaddl-
>>> type
>>>>> instructions from extloads on ARM
>>>>>
>>>>> Hi Silviu,
>>>>>
>>>>> Thanks for the update. This is definitely much better. Why do we
>>> need
>>>>> any ARM specific code here, though? Given the problem description,
>>> it
>>>>> seems like this is something the legalizer should be able to
>> reason
>>>>> about without needing custom handling in the target.
>>>>>
>>>>> -Jim
>>>>>
>>>>>
>>>>> On Sep 6, 2012, at 9:34 AM, Silviu Baranga <silbar01 at arm.com>
>> wrote:
>>>>>
>>>>>> Hi Jim,
>>>>>>
>>>>>> Thanks for the review. The BREAK node solution was a bit ugly.
>>>>>>
>>>>>> I've attached a new patch that solves this without a BREAK node.
>>>>>> This is done by stopping the folding of nodes zext nodes when
>>>>>> this will result in a non-legal operation.
>>>>>>
>>>>>> I also had to stop the (zext (zextload x)) folding for vector
>>>>>> types. This is already being done for sext nodes.
>>>>>>
>>>>>> Thanks,
>>>>>> Silviu
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Jim Grosbach [mailto:grosbach at apple.com]
>>>>>>> Sent: 06 September 2012 01:42
>>>>>>> To: Silviu Baranga
>>>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>>>> Subject: Re: [llvm-commits] [patch] Enable the generation of
>>> vaddl-
>>>>> type
>>>>>>> instructions from extloads on ARM
>>>>>>>
>>>>>>> Hi Silviu,
>>>>>>>
>>>>>>> I like the idea of teaching the compiler to use these
>>> instructions,
>>>>> but
>>>>>>> the BREAK pseudo-instruction feels pretty gross to me. I'd much
>>>>> prefer
>>>>>>> we figure out how to teach the combiner and isel to be smarter
>>>>> directly
>>>>>>> rather than doing something like that.
>>>>>>>
>>>>>>> -Jim
>>>>>>>
>>>>>>> On Sep 4, 2012, at 1:58 AM, Silviu Baranga <silbar01 at arm.com>
>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Currently LLVM will not generate a vaddl-type instructions when
>>> an
>>>>>>> extload
>>>>>>>> node is in the input dag. All vector extload nodes are
>>> implemented
>>>>>>> with
>>>>>>>> a load instruction and a lengthening instruction on ARM. If the
>>>>>>> lengthening
>>>>>>>> instruction is followed by an add instruction, it would be
>> better
>>>>> to
>>>>>>>> generate
>>>>>>>> a vaddl-type instruction instead.
>>>>>>>>
>>>>>>>> To do this, we split all lengthening dag nodes which extend a
>>> type
>>>>> to
>>>>>>> more
>>>>>>>> then twice its size (this is OK because these nodes would be
>>>>>>> implemented by
>>>>>>>> multiple instructions anyway) into two lengthening nodes. One
>>> node
>>>>>>> will be
>>>>>>>> later
>>>>>>>> combined to form the extload node, and the other can be
>> selected
>>> as
>>>>>>> part of
>>>>>>>> a
>>>>>>>> vaddl-type instruction (there are already patterns for this).
>>>>>>>>
>>>>>>>> To stop the dag nodes from re-combining, we insert a BREAK node
>>>>>>> between
>>>>>>>> them. The
>>>>>>>> BREAK node is used as an optimization barrier for the DAG
>>> combiner,
>>>>>>> and
>>>>>>>> might be
>>>>>>>> useful in other cases as well. We remove the BREAK node in the
>>>>>>> instruction
>>>>>>>> selection stage.
>>>>>>>>
>>>>>>>> This approach allows the reuse of existing selection patterns.
>>>>>>>>
>>>>>>>> This patch depends on the following other patch:
>>>>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-
>>>>>>> 20120827/149201.
>>>>>>>> html
>>>>>>>>
>>>>>>>> Please review!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>> Silviu<patch.diff>_______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>>
>>>>>> <zext_split.diff>
>>>>>
>>>>>
>>>> <zext_split.diff>
>>>
>>>
>
>
>
>
More information about the llvm-commits
mailing list