[llvm-commits] Fix PR 13837: Optimization for vector creation from type conversion of another vector

Mon Oct 15 14:29:30 PDT 2012

Ping?

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

-----Original Message-----
From: Weiming Zhao [mailto:weimingz at codeaurora.org] 
Sent: Wednesday, October 03, 2012 11:03 AM
To: 'Duncan Sands'
Cc: 'llvm-commits at cs.uiuc.edu'; 'Jason Sams'; 'rajav at codeaurora.org'
Subject: RE: [llvm-commits] Fix PR 13837: Optimization for vector creation
from type conversion of another vector

Hi Duncan,

I added all the unary conversions that could be vectorized as well as test
cases for ARM. 
Besides, the code is quite extensible for adding more similar transforms in
future. 
For operations without vec support on ARM,  I still add corresponding test
cases, but they  just check if the function is emitted (no crash).

It would be nice if we have test cases for all platforms. However, as our
dev environment is solely ARM, we have to leave the test cases for x86,
MIPS. PowerPC, Sparc etc. to other contributors.

Thanks,
Weiming

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

-----Original Message-----
From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of Duncan Sands
Sent: Tuesday, October 02, 2012 12:17 AM
To: weimingz at codeaurora.org
Cc: llvm-commits at cs.uiuc.edu; 'Jason Sams'
Subject: Re: [llvm-commits] Fix PR 13837: Optimization for vector creation
from type conversion of another vector

Hi Weiming,

> Ping ?
>

...

>
> I added two cases: FNEG and TRUNCATE (narrowing) and also added 
> corresponding test cases. (see attached patch) For widening operations 
> (zext, sext) like i32->i64 and float -> float64, since there's no 
> corresponding vector instr. on ARM, so I didn't add them.

as this is a general DAG transform, not an ARM specific combine, what ARM
supports is not relevant here.  If you need help writing testcases for the
transforms that can't occur on ARM hopefully someone on the list will be
able to help you.

Ciao, Duncan.

>
>
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
> hosted by The Linux Foundation
>
>
> -----Original Message-----
> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of Duncan 
> Sands
> Sent: Thursday, September 20, 2012 12:21 AM
> To: Weiming Zhao
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] Fix PR 13837: Optimization for vector 
> creation from type conversion of another vector
>
> Hi Weiming,
>
>> Sorry I missed your message.  I was monitoring only the messages that 
>> sent to me :(
>
> oops, sorry about that.
>
>> Very appreciate for your review and valuable and detailed feedback.
>>
>> I followed all of your suggests except:
>> Why restrict yourself to just this set of casts?
>> Currently, LLVM supports 4 type conversions: S/U INT <-> FP.
>
> I had in mind things like zext from i32 to i64 and other integer to 
> integer or floating point to floating point casts.  It seems to me 
> that this kind of transformation could be helpful for them too.
>
> I will comment on your new patch later.
>
> Ciao, Duncan.
>
>    And they have
>> corresponding vector instructions. Do you mean re-interpretation cast?
>> I also looked at other conversions. For example, for 
>> zero_exd/sign_exd, I tested from char3->int3, but ARM backend 
>> generates "AND" operation and no need vector instructions at all.
>>
>> I fixed up everything else.
>> Please help to review them.
>>
>> Thanks,
>> Weiming
>>
>>
>>
>>
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu 
>> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands
>> Sent: Friday, September 14, 2012 1:19 AM
>> To: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] Fix PR 13837: Optimization for vector 
>> creation from type conversion of another vector
>>
>> Hi Weiming,
>>
>>> When a vector is created from type conversion, it generates multiple 
>>> scalar conversion instead of a vectorized conversion.
>> ...
>>> Current LLVM first extracts each element of "in" and then does a 
>>> fp-to-si for each element and then builds the result vector.
>>>
>>> This patch allows a vectorized fp-to-si conversion against the input 
>>> vector directly, thus results in better performance.
>>
>>> --- a/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>> +++ b/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>> @@ -7933,6 +7933,50 @@ SDValue DAGCombiner::visitBUILD_VECTOR(SDNode
>>> *N)
> {
>>>        AllAnyExt &= AnyExt;
>>>      }
>>>
>>> +  // Check if we can combine cast and build  SmallVector<SDValue,
>>> + 8> CastOps;
>>
>> Since this vector isn't used until later, it can be moved later.
>>
>>> +
>>> +  unsigned ElmOp;
>>> +  EVT CastFromType;
>>> +  bool HasNonConvOp = false;
>>> +  for (unsigned i = 0; i != NumInScalars; ++i) {
>>> +    SDValue In = N->getOperand(i);
>>> +    ElmOp = In.getOpcode();
>>> +    if (ElmOp == ISD::UNDEF)
>>> +      continue;
>>> +    else if (ElmOp == ISD::SINT_TO_FP || ElmOp == ISD::UINT_TO_FP
>>> + ||
>>
>> No need for "else" here (style).
>>
>>> +      ElmOp == ISD::FP_TO_SINT || ElmOp == ISD::FP_TO_UINT) {
>>
>> Why restrict yourself to just this set of casts?  Isn't this 
>> optimization worth considering for any cast?
>>
>>> +      CastFromType = In.getOperand(0).getValueType();
>>> +      break;
>>> +    }
>>> +    else {
>>
>> No need for "else" here (style).
>>
>>> +      // Op_i is neither UNDEF nor type conversion
>>> +      HasNonConvOp = true;
>>> +      break;
>>> +    }
>>> +  }
>>
>> I think in the above loop you should already check that all casts are 
>> of the same type.  You can do this by initializing ElmOp to 
>> DELETED_NODE
> (i.e.
>> zero).
>> Do not set ElmOp if you have an undef node.  If ElmOp is non-zero and 
>> is different to the current node's opcode then reset ElmOp to zero 
>> and
> break.
>> If ElmOp is non-zero and is equal to the current node's opcode then 
>> continue.
>> Otherwise, check that you have a valid cast opcode, and if so set 
>> ElmOp to it otherwise break.
>>
>> You also need to check that all cast operands have the same type.
>> After all, one might be a cast of i8 to double and another a cast of
>> i32 to double.  So I suggest that in parallel with ElmOp you also 
>> keep track of CastFromType in the same way, breaking out if there is 
>> a
> mismatch.
>>
>>> +
>>> +  if (!HasNonConvOp) {
>>
>> If you follow the above scheme then you can get rid of HasNonConvOp, 
>> and just check that ElmOp != DELETED_NODE.
>>
>>> +    // Since AllOpererndsUnfer has been checked, here ElmOp must be 
>>> + one of the
>>
>> AllOpererndsUnfer doesn't exist.
>>
>>> +    // type conversions.
>>
>> This is not true in the case when every operand is undef (in which 
>> case I expect you to crash below).  Using the scheme I described 
>> above would take care of this.
>>
>>     Now, check if the all operands are either of the same
>>> +    // type conversion or UNDEF.
>>
>> The CastOps vector can be declared here.  You can reserve 
>> NumInScalars space since you know already that that's the number of 
>> elements you will
> push in.
>>
>>> +    for (unsigned i = 0; i != NumInScalars; ++i) {
>>> +        if (N->getOperand(i).getOpcode() == ElmOp)
>>
>> Wrong indentation (style).
>>
>>> +          CastOps.push_back(N->getOperand(i).getOperand(0));
>>> +        else if (N->getOperand(i).getOpcode() == ISD::UNDEF)
>>
>> With the scheme I described above, if the opcode is not ElmOp then it 
>> is UNDEF, so you here you can just do
>>               else {
>>                 assert(N->getOperand(i).getOpcode() == ISD::UNDEF);
>>                 CastOps.push_back(DAG.getUNDEF(CastFromType));
>>               }
>>
>>> +          CastOps.push_back(DAG.getUNDEF(CastFromType));
>>> +      else
>>> +        break;
>>
>> And this last else would be pointless.
>>
>>> +    }
>>> +  }
>>> +
>>> +  if (CastOps.size() == NumInScalars) {
>>
>> This can be moved inside the previous block.  If you follow the 
>> scheme I suggest then in fact this test will never fail if you are in 
>> the previous if block so can be removed.
>>
>>> +    EVT VecVT = EVT::getVectorVT(*DAG.getContext(), CastFromType,
>> NumInScalars);
>>> +    SDValue Vec = DAG.getNode(ISD::BUILD_VECTOR, dl,
>>> +        VecVT, &CastOps[0], CastOps.size());
>>> +    return DAG.getNode(ElmOp, dl, VT, Vec);
>>
>> This may create an illegal operation.  So you should condition this 
>> on either creating illegal operations being allowed
>> (!LegalOperations) or on the operation being legal (check using
TLI.isOperationLegal).
>>
>>> +  }
>>> +
>>>      // In order to have valid types, all of the inputs must be 
>>> extended
>> from the
>>>      // same source type and all of the inputs must be any or zero
extend.
>>>      // Scalar sizes must be a power of two.
>>
>> As you seem to have missed several failure modes (all operands undef, 
>> all operands doing the same cast operation but with source types of 
>> different
>> size) please add some additional testcases covering these possibilities.
>>
>> Ciao, Duncan.
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>