[llvm-commits] [PATCH] ARM: Allow long extensions to combine into vmull

Sebastian Pop spop at codeaurora.org
Fri Nov 30 11:09:48 PST 2012


Committed as r169024.

Sebastian

On Fri, Nov 30, 2012 at 12:10 PM, Sebastian Pop <spop at codeaurora.org> wrote:
> Hi David,
>
> I will commit your patch.
>
> I haven't realized that you don't have svn write access.
> Can you please work with Chris to get svn write access?
> See the instructions here:
> http://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access
>
> Thanks,
> Sebastian
>
> On Fri, Nov 30, 2012 at 11:22 AM, David Peixotto
> <dpeixott at codeaurora.org> wrote:
>> Ping
>>
>>
>>
>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>> by The Linux Foundation
>>
>>
>>
>>
>>
>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>> Sent: Tuesday, November 27, 2012 9:17 PM
>>
>>
>> To: David Peixotto
>> Cc: Pete Couperus; llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] [PATCH] ARM: Allow long extensions to combine
>> into vmull
>>
>>
>>
>> LGTM
>>
>> On Nov 26, 2012, at 4:47 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
>>
>> Thanks for the review, Evan. I’ve attached an updated patch that addresses
>> your concern by renaming the functions to indicate they are for VMULL
>> lowering (e.g. SkipExtensionForVMULL). I also updated the unit test to work
>> with –march=arm –mattr=+neon instead of specifying –mcpu=cortex-a8.
>>
>>
>>
>> Please review further or commit if it looks good. Thanks!
>>
>>
>>
>> -David
>>
>>
>>
>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>> by The Linux Foundation
>>
>>
>>
>>
>>
>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>> Sent: Sunday, November 25, 2012 12:02 PM
>> To: David Peixotto
>> Cc: 'Pete Couperus'; llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] [PATCH] ARM: Allow long extensions to combine
>> into vmull
>>
>>
>>
>> The patch looks fine to me. However, I don't like the generic names of the
>> functions AddRequiredExtension, SkipLoadExtension, and SkipExtension. These
>> functions are only used for VMULL lowering and should be named as such.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Evan
>>
>>
>>
>> On Nov 15, 2012, at 10:47 AM, David Peixotto <dpeixott at codeaurora.org>
>> wrote:
>>
>>
>>
>>
>> Thanks for the review, Pete. I updated the patch based on your comments and
>> attached the new version.
>>
>>
>>
>> Reviewers, please send any other feedback.
>>
>>
>>
>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>> by The Linux Foundation
>>
>>
>>
>>
>>
>> From: Pete Couperus [mailto:pjcoup at gmail.com]
>> Sent: Wednesday, November 14, 2012 10:21 PM
>> To: David Peixotto
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] [PATCH] ARM: Allow long extensions to combine
>> into vmull
>>
>>
>>
>> Hi David,
>>
>> This mostly looks fine to me.
>> Minor comments:
>> 1. The body in your SkipLoadExtension is overindented (4 spaces, should be
>> 2).
>> 2. I'd reverse the conditional in AddRequiredExtension from:
>>   if (OrigTy.getSizeInBits() < 64) {
>>     [stuff]
>>   }
>>   return N;
>> to:
>>   if (OrigTy.is64BitVector())
>>     return N;
>>   [stuff]
>>
>> Maybe 2 is personal preference, but I think it reads easier this way.
>> Otherwise, I think this is good.
>>
>> Pete
>>
>> On Wed, Nov 14, 2012 at 5:06 PM, David Peixotto <dpeixott at codeaurora.org>
>> wrote:
>>
>> Hi Pete,
>>
>>
>>
>> As long as we are fixing it I think it makes sense to handle both cases
>> (bare sign/zero extend and sign/zero load extend). If you have no
>> objections, let’s move ahead with my patch since it is already factored that
>> way.
>>
>>
>>
>> I’ve attached an updated version that I rebased against the current head
>> (contents are the same as my original patch). Please take a look and let me
>> know what you think and we can work together to get it merged.
>>
>>
>>
>> -David
>>
>>
>>
>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>> by The Linux Foundation
>>
>>
>>
>>
>>
>> From: Pete Couperus [mailto:pjcoup at gmail.com]
>> Sent: Tuesday, November 13, 2012 10:09 PM
>>
>>
>> To: David Peixotto
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] [PATCH] ARM: Allow long extensions to combine
>> into vmull
>>
>>
>>
>> Hi David,
>>
>> I had basically aimed to minimize the changes, to address one problem.  I
>> agree with you, and think there are potential problems in this area...zext
>> vs. sext, and SIGN/ZERO_EXTEND vs. [SZ]EXTLOAD.  Without the long sign/zero
>> extension change, we'll still trigger the outer assertion...but just writing
>> that makes me feel that we might as well put that logic in as well.  Since
>> that is essentially your patch, want to have another go?  Or, would you like
>> me to fold it into mine?  I'll try and be pleasantly persistent to get
>> either of them in :).
>>
>> Pete
>>
>> On Tue, Nov 13, 2012 at 1:33 PM, David Peixotto <dpeixott at codeaurora.org>
>> wrote:
>>
>> Hi Pete,
>>
>>
>>
>> Thanks for updating your patch.
>>
>>
>>
>> I don’t think I was able to trigger the long sign/zero extension with a
>> simple test case either, although I don’t know of a reason why it would not
>> be possible to trigger that path. If we don’t want to put the full logic in
>> to support the possibility how about we add a comment and an assertion to
>> verify that the underlying node is a 64-bit vector. That way if our
>> assumption is wrong we can quickly identify and fix the issue.
>>
>>
>>
>> -David
>>
>>
>>
>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>> by The Linux Foundation
>>
>>
>>
>>
>>
>> From: Pete Couperus [mailto:pjcoup at gmail.com]
>> Sent: Monday, November 12, 2012 10:02 PM
>> To: David Peixotto
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] [PATCH] ARM: Allow long extensions to combine
>> into vmull
>>
>>
>>
>> Hi David,
>>
>> Thanks for taking a look.
>> My apologies, I hadn't seen your patch when I'd written this up.
>> And, you're right, I neglected the v2i8 case.
>> I originally had logic in for handling long sign/zero extension as well, but
>> I couldn't trigger that with any of the basic test cases I thought of.
>> Curiously, the unsigned/long zero extension cases work in 3.1, but the
>> codegen for unsigned v2i16->v2i64 followed by mul is substantially worse
>> than for the signed case.
>> All the test cases that you sent pass with this new patch (allowing the
>> v2i8->v2i64 extension).
>> Perhaps someone else wants to take a look :).
>>
>> Pete
>>
>> On Mon, Nov 12, 2012 at 10:00 AM, David Peixotto <dpeixott at codeaurora.org>
>> wrote:
>>
>> Hi Pete,
>>
>>
>>
>> I sent a patch for this a while back, but never got any feedback and failed
>> to push it all the way through. I’ve attached my patch for reference.
>>
>>
>>
>> It looks like the patches are fairly similar in their intent. I have a few
>> more test cases, so please include those in your patch as well. It would
>> also be good to include some of the comments, particularly the SkipExtension
>> comment that explains the motivation for that function.
>>
>>
>>
>> The main differences between the patches that I see are:
>>
>>
>>
>> 1)      Handling a sign extend that is not large enough
>>
>> 2)      Handling v2i8 vector size
>>
>>
>>
>> I’ll expand on these points below.
>>
>>
>>
>> Index: lib/Target/ARM/ARMISelLowering.cpp
>>
>> ===================================================================
>>
>> --- lib/Target/ARM/ARMISelLowering.cpp            (revision 167684)
>>
>> +++ lib/Target/ARM/ARMISelLowering.cpp         (working copy)
>>
>> @@ -4922,11 +4922,30 @@
>>
>> static SDValue SkipExtension(SDNode *N, SelectionDAG &DAG) {
>>
>>    if (N->getOpcode() == ISD::SIGN_EXTEND || N->getOpcode() ==
>> ISD::ZERO_EXTEND)
>>
>>      return N->getOperand(0);
>>
>>
>>
>> What it N is a small vector here? Don’t we need to make sure that we have a
>> sign/zero extend that will extend the size to 64-bit vector?
>>
>>
>>
>> -  if (LoadSDNode *LD = dyn_cast<LoadSDNode>(N))
>>
>> -    return DAG.getLoad(LD->getMemoryVT(), N->getDebugLoc(), LD->getChain(),
>>
>> -                       LD->getBasePtr(), LD->getPointerInfo(),
>> LD->isVolatile(),
>>
>> -                       LD->isNonTemporal(), LD->isInvariant(),
>>
>> -                       LD->getAlignment());
>>
>> +
>>
>> +  if (LoadSDNode *LD = dyn_cast<LoadSDNode>(N)) {
>>
>> +    SDValue BareLD = DAG.getLoad(LD->getMemoryVT(), N->getDebugLoc(),
>>
>> +                                LD->getChain(), LD->getBasePtr(),
>>
>> +                                LD->getPointerInfo(), LD->isVolatile(),
>>
>> +                                LD->isNonTemporal(), LD->isInvariant(),
>>
>> +                                LD->getAlignment());
>>
>> +    if (BareLD.getValueType().is64BitVector())
>>
>> +      return BareLD;
>>
>> +
>>
>> +    // This is a long extension, v4i8 -> v4i32 or v2i16 -> v2i64
>>
>> +    MVT ExtVT;
>>
>> +    if (LD->getMemoryVT() == MVT::v4i8)
>>
>> +      ExtVT = MVT::v4i16;
>>
>> +    else if (LD->getMemoryVT() == MVT::v2i16)
>>
>> +      ExtVT = MVT::v2i32;
>>
>>
>>
>> What about MVT::v2i8. We can handle that case similar to v2i16 by extending
>> the size to v2i32.
>>
>>
>>
>> +
>>
>> +    if (LD->getExtensionType() == ISD::SEXTLOAD)
>>
>> +      return DAG.getNode(ISD::SIGN_EXTEND, BareLD.getDebugLoc(), ExtVT,
>> BareLD);
>>
>> +    else if (LD->getExtensionType() == ISD::ZEXTLOAD)
>>
>> +      return DAG.getNode(ISD::ZERO_EXTEND, BareLD.getDebugLoc(), ExtVT,
>> BareLD);
>>
>> +
>>
>> +    llvm_unreachable("Unexpected extended load type for SkipExtension.");
>>
>> +  }
>>
>>    // Otherwise, the value must be a BUILD_VECTOR.  For v2i64, it will
>>
>>    // have been legalized as a BITCAST from v4i32.
>>
>>    if (N->getOpcode() == ISD::BITCAST) {
>>
>>
>>
>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>> by The Linux Foundation
>>
>>
>>
>>
>>
>> From: llvm-commits-bounces at cs.uiuc.edu
>> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Pete Couperus
>> Sent: Saturday, November 10, 2012 11:48 PM
>> To: llvm-commits at cs.uiuc.edu
>> Subject: [llvm-commits] [PATCH] ARM: Allow long extensions to combine into
>> vmull
>>
>>
>>
>> Hello,
>>
>> The ARM backend currently crashes when trying to combine:
>>         %tmp1 = load <4 x i8>* %A
>>         %tmp2 = load <4 x i8>* %B
>>         %tmp3 = sext <4 x i8> %tmp1 to <4 x i32>
>>         %tmp4 = sext <4 x i8> %tmp2 to <4 x i32>
>>         %tmp5 = mul <4 x i32> %tmp3, %tmp4
>>
>> into a vmull.  This was reported as PR 12281.
>> This patch fixes this problem.
>> Please review!
>> Thanks,
>>
>> Pete
>>
>>
>>
>>
>>
>>
>>
>> <0001-Codegen-failure-for-vmull-with-small-vectors.patch>_______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>>
>> <0001-Codegen-failure-for-vmull-with-small-vectors.patch>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation




More information about the llvm-commits mailing list