[llvm-commits] Fix PR 13837: Optimization for vector creation from type conversion of another vector

Zhao, Weiming weimingz at quicinc.com
Wed Sep 19 15:03:36 PDT 2012


Hi Duncan,

I'm attaching a updated patch, which cleans some testing code (I was testing some other conversions).
Please review it.

Thanks,
weiming

-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Weiming Zhao
Sent: Wednesday, September 19, 2012 2:26 PM
To: 'Duncan Sands'; llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] Fix PR 13837: Optimization for vector creation from type conversion of another vector

Hi Duncan,

Sorry I missed your message.  I was monitoring only the messages that sent to me :(

Very appreciate for your review and valuable and detailed feedback.

I followed all of your suggests except:
Why restrict yourself to just this set of casts?  
Currently, LLVM supports 4 type conversions: S/U INT <-> FP. And they have corresponding vector instructions. Do you mean re-interpretation cast?
I also looked at other conversions. For example, for zero_exd/sign_exd, I tested from char3->int3, but ARM backend generates "AND" operation and no need vector instructions at all.

I fixed up everything else.
Please help to review them. 

Thanks,
Weiming




-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu
[mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands
Sent: Friday, September 14, 2012 1:19 AM
To: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] Fix PR 13837: Optimization for vector creation from type conversion of another vector

Hi Weiming,

> When a vector is created from type conversion, it generates multiple 
> scalar conversion instead of a vectorized conversion.
...
> Current LLVM first extracts each element of "in" and then does a 
> fp-to-si for each element and then builds the result vector.
>
> This patch allows a vectorized fp-to-si conversion against the input 
> vector directly, thus results in better performance.

> --- a/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> +++ b/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> @@ -7933,6 +7933,50 @@ SDValue DAGCombiner::visitBUILD_VECTOR(SDNode *N) {
>      AllAnyExt &= AnyExt;
>    }
>
> +  // Check if we can combine cast and build  SmallVector<SDValue, 8> 
> + CastOps;

Since this vector isn't used until later, it can be moved later.

> +
> +  unsigned ElmOp;
> +  EVT CastFromType;
> +  bool HasNonConvOp = false;
> +  for (unsigned i = 0; i != NumInScalars; ++i) {
> +    SDValue In = N->getOperand(i);
> +    ElmOp = In.getOpcode();
> +    if (ElmOp == ISD::UNDEF)
> +      continue;
> +    else if (ElmOp == ISD::SINT_TO_FP || ElmOp == ISD::UINT_TO_FP ||

No need for "else" here (style).

> +      ElmOp == ISD::FP_TO_SINT || ElmOp == ISD::FP_TO_UINT) {

Why restrict yourself to just this set of casts?  Isn't this optimization worth considering for any cast?

> +      CastFromType = In.getOperand(0).getValueType();
> +      break;
> +    }
> +    else {

No need for "else" here (style).

> +      // Op_i is neither UNDEF nor type conversion
> +      HasNonConvOp = true;
> +      break;
> +    }
> +  }

I think in the above loop you should already check that all casts are of the same type.  You can do this by initializing ElmOp to DELETED_NODE (i.e.
zero).
Do not set ElmOp if you have an undef node.  If ElmOp is non-zero and is different to the current node's opcode then reset ElmOp to zero and break.
If ElmOp is non-zero and is equal to the current node's opcode then continue.
Otherwise, check that you have a valid cast opcode, and if so set ElmOp to it otherwise break.

You also need to check that all cast operands have the same type.  After all, one might be a cast of i8 to double and another a cast of i32 to double.  So I suggest that in parallel with ElmOp you also keep track of CastFromType in the same way, breaking out if there is a mismatch.

> +
> +  if (!HasNonConvOp) {

If you follow the above scheme then you can get rid of HasNonConvOp, and just check that ElmOp != DELETED_NODE.

> +    // Since AllOpererndsUnfer has been checked, here ElmOp must be 
> + one of the

AllOpererndsUnfer doesn't exist.

> +    // type conversions.

This is not true in the case when every operand is undef (in which case I expect you to crash below).  Using the scheme I described above would take care of this.

  Now, check if the all operands are either of the same
> +    // type conversion or UNDEF.

The CastOps vector can be declared here.  You can reserve NumInScalars space since you know already that that's the number of elements you will push in.

> +    for (unsigned i = 0; i != NumInScalars; ++i) {
> +        if (N->getOperand(i).getOpcode() == ElmOp)

Wrong indentation (style).

> +          CastOps.push_back(N->getOperand(i).getOperand(0));
> +        else if (N->getOperand(i).getOpcode() == ISD::UNDEF)

With the scheme I described above, if the opcode is not ElmOp then it is UNDEF, so you here you can just do
            else {
              assert(N->getOperand(i).getOpcode() == ISD::UNDEF);
              CastOps.push_back(DAG.getUNDEF(CastFromType));
            }

> +          CastOps.push_back(DAG.getUNDEF(CastFromType));
> +      else
> +        break;

And this last else would be pointless.

> +    }
> +  }
> +
> +  if (CastOps.size() == NumInScalars) {

This can be moved inside the previous block.  If you follow the scheme I suggest then in fact this test will never fail if you are in the previous if block so can be removed.

> +    EVT VecVT = EVT::getVectorVT(*DAG.getContext(), CastFromType,
NumInScalars);
> +    SDValue Vec = DAG.getNode(ISD::BUILD_VECTOR, dl,
> +        VecVT, &CastOps[0], CastOps.size());
> +    return DAG.getNode(ElmOp, dl, VT, Vec);

This may create an illegal operation.  So you should condition this on either creating illegal operations being allowed (!LegalOperations) or on the operation being legal (check using TLI.isOperationLegal).

> +  }
> +
>    // In order to have valid types, all of the inputs must be extended
from the
>    // same source type and all of the inputs must be any or zero extend.
>    // Scalar sizes must be a power of two.

As you seem to have missed several failure modes (all operands undef, all operands doing the same cast operation but with source types of different
size) please add some additional testcases covering these possibilities.

Ciao, Duncan.
_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-PR-13837-Combine-Cast-and-Build_Vector-when-possible.patch
Type: application/octet-stream
Size: 10941 bytes
Desc: 0003-PR-13837-Combine-Cast-and-Build_Vector-when-possible.patch
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120919/b28f5e41/attachment.obj>


More information about the llvm-commits mailing list