[llvm-commits] Patch: generalize vector shuffle

Wed Nov 5 11:53:00 PST 2008

Hi Duncan,

On Nov 5, 2008, at 9:25 AM, Duncan Sands wrote:

> Hi Mon Ping,
>
> general comment: there is trailing whitespace on many lines.
>

I'll run a scrubber and make sure they are all removed.

>> @@ -1324,8 +1326,7 @@
>>   const VectorType *MaskTy = dyn_cast<VectorType>(Mask->getType());
>>   if (!isa<Constant>(Mask) || MaskTy == 0 ||
>>       MaskTy->getElementType() != Type::Int32Ty ||
>> -      MaskTy->getNumElements() !=
>> -      cast<VectorType>(V1->getType())->getNumElements())
>> +      MaskTy->getNumElements() <= 1)
>
> Why not allow a mask of length 1?  The DAG combiner or SelectionDAG
> can simplify this of course, but I don't see why it should be illegal.

 From an implementation point of view, there is no reason not to allow  
it.  At a IR level, there is no problem having vector of length 1 but  
I preferred if they would use the element instead and use extract  
element.  Since vector of single element is allowed in the IR, I  
shouldn't allow my prejudices against the concept affect me and allow  
a vector shuffle to create a single vector element :->.

>> Assert1(MV->getNumOperands() > 1,
>> +            "Shufflevector mask can not have one element!", &SV);
>
> Likewise.
>
>> -  unsigned NumElts = cast<VectorType>(V1->getType())- 
>> >getNumElements();
>> +
>> +  unsigned NumElts = cast<VectorType>(Mask->getType())- 
>> >getNumElements();
>> +  unsigned V1NumElts = cast<VectorType>(V1->getType())- 
>> >getNumElements();
>>   const Type *EltTy = cast<VectorType>(V1->getType())- 
>> >getElementType();
>
> How about SrcNumElts instead of V1NumElts, and MaskNumElts instead  
> of NumElts?
> Likewise in the rest of the patch: the names given to the number of  
> elements
> are not always self-explanatory.
>

That does seem clearer.  I'll make the change.

>> +//===---- LegalizeTypesUtils.cpp - Utilities for Legalization of  
>> types
>
> Why not only do it in LegalizeTypes?
>

Are you asking why these utilities are not part of LegalizeTypes  
instead of placing them in their own class?  If so, LegalizeDAG would  
need access to the utility and I thought it was cleaner to have a  
separate class that both LegalizeDAG and LegalizeTypes can have access  
to instead of dirtying up LegalizeTypes with these utilities.

> Also, I think SplitVecRes_VECTOR_SHUFFLE needs some changes, but I  
> didn't
> spot any.
>
>> +void DAGLegalizeTypeUtils::DetermineVectorShuffleUse(SelectionDAG&  
>> CurDAG,
>> +                               SDValue Mask, unsigned Start,  
>> unsigned End,
>> +                               unsigned NumElements, unsigned Half,
>> +                               MVT NewEltVT, SmallVector<SDValue,  
>> 16>& NewMask,
>> +                               VecUse& LhsUse, VecUse& RhsUse) {
>
> How about a comment describing this function.  I found the names  
> LhsUse and
> RhsUse not very self-explanatory.
>

I put the comment in the .h file for the function but didn't  
duplicated in the .cpp file.  I'll copy it over so people don't have  
to to go the header file to figure out what it does.  How about  
Src1Use and Src2Use instead of LhsUse and RhsUse?  We are trying to  
figure out how the input vectors are being used in the vector shuffle.

>> +  if (LLhsUse != VecUseBoth && LRhsUse != VecUseBoth) {
>
> I have the impression that you could reduce the amount of duplicated
> code by swapping Lo and Hi parts depending on what is used.  By the  
> way,
> I thought a bit about how to do this splitting, and I think what you  
> are
> doing is pretty optimal.  However I wasn't brave enough to check  
> that you
> are actually doing what I think you are doing :)  I will check the  
> algorithm
> when you have a final version.
>

After you point this out,  I think I understand now Evan's point too :- 
 >.  I can't factor out how to handle the vector is being used but I  
can factor out the common code that either extract the low part of   
the vector, the high part of the vector, or create an undef vector  
when it is not being used.  I'll make that change.

>> +    Ops.clear();
>
> Not sure why you have Ops.clear() at the end of each branch, rather
> than once after the end of the "if"...
>
>> +    Ops.clear();
>> +  }
>> +}
>
> It's also rather pointless at the end of the function!

Both of those are a mistake.  I cleaned that part up yesterday by only  
putting Ops.clear() before I start inserting      elements into Ops  
but I didn't want to send another patch for that small change.

>> +void DAGTypeLegalizer::SplitVecRes_EXTRACT_SUBVECTOR(SDNode *N,  
>> SDValue &Lo,
>> +                                                     SDValue &Hi) {
>> +  // We only support constant index.
>> +  MVT LoVT, HiVT;
>> +  GetSplitDestVTs(N->getValueType(0), LoVT, HiVT);
>> +  unsigned LoNumElts = LoVT.getVectorNumElements();
>> +  unsigned HiNumElts = HiVT.getVectorNumElements();
>> +
>> +  SDValue Vec = N->getOperand(0);
>> +  SDValue Idx = N->getOperand(1);
>> +
>> +  if (TLI.isTypeLegal(Vec.getValueType())) {
>
> Why not just always do this?  It should work fine even if Vec is  
> illegal.

That is what I had originally had but then though that if the incoming  
vector is illegal, it would be better to split it at this point.   
There is no point doing it here as SplitVecOp_EXTRACT_SUBVECTOR would  
do this for me anyway.

>> +    // If the incoming vector type is legal, just extract the  
>> correct subpart.
>> +    ConstantSDNode *CIdx = cast<ConstantSDNode>(Idx);
>
> You don't have to assume that Idx is constant here, you just have to  
> do an
> ADD below.

I'll remove the restriction.

>
>
>> +    Lo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, LoVT, Vec, Idx);
>> +    Hi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, HiVT, Vec,
>> +                     DAG.getConstant(CIdx->getZExtValue() +  
>> LoNumElts,
>> +                                     Idx.getValueType()));
>
> If LoVT != HiVT then there is no reason to think that these new
> indices are a multiple of the lengths LoVT/HiVT, i.e. you may be
> producing illegal subvectors here.  I think you should add an
> assertion that LoVT == HiVT until the vector widening stuff is
> complete (and splitting is no longer done for non-power-of-two
> vectors).
>

That is true.  In generally, it will not work but there are cases were  
it will.  For the case of a constant index, I can check to make sure  
that the indices are a multiple of the length.  Otherwise, the LoVT ==  
HiVT.  I want to make sure the case of vector 3 should work.  As we  
discussing previously in widening emails, for all architectures, widen  
will always widen a vector to a power of 2 so LoVT should always HiVT.

>> +  } else {
>
> Aren't you just repeating here the work done by
> SplitVecOp_EXTRACT_SUBVECTOR?

Yes, I'm.  I was splitting the op early but there is no reason for me  
to do so.  I misunderstood the structure of LegalizeType.

>> -  case ISD::VECTOR_SHUFFLE:
>> +  case ISD::VECTOR_SHUFFLE: {
>> +    assert(Node->getOperand(0).getValueType().getVectorNumElements 
>> () ==
>> +           Op.getValueType().getVectorNumElements() &&
>> +           "Vector shuffle must be normalized");
>
> Why must it be normalized, and who takes care of this?  Likewise  
> below.

When we build the SelectionDAG, we normalize all the vector shuffles  
and introduce any new operation.   The pros of doing this way is that  
after DAG construction, none of the DAG passes would have to worry  
about non-normalized shuffles, allow optimizations after   
normalization, and in LegalizeTypes, we wouldn't have to do a pass to  
normalize the vector shuffles. (We might be able to fold the work when  
iterate the DAG to build the worklist in LegalizeTypes).  The downside  
is no DAG optimization can generate a non-normalized shufflevector.   
The latter shouldn't be a problem since any optimizations that we want  
do with non-normalized shuffle vectors should be done in phases like  
InstCombine.

>
>> -  case ISD::FCOS: {
>> +  case ISD::FCOS:
>> +  case ISD::CTPOP:
>> +  case ISD::CTTZ:
>> +  case ISD::CTLZ: {
>
> What's this about?
>

This actually a change in widening when I was testing the vector  
shuffle.  It shouldn't be part of this patch as it should be part of  
the second widening patch when support for widening goes into  
LegalizeTypes.

>> +    // Check if we need to normalize the vector shuffle
>
> I'm not sure this belongs in LegalizeTypes.  It seems to
> me that this is to do with legalizing the operation, because
> it is unsupported by targets, not legalizing the types.  As
> such it belongs in LegalizeDAG.  In any case, you don't
> actually do anything here!
>

Ack! I been looking at this patch so long that missed the comments.   
It should not be there as I created it when I was experiment with  
normalizing in LegalizeTypes.  I decided not to do it that way (see  
above) and I thought I deleted all the code for that experiment but I  
didn't.

>
>> +  // Normalize the shuffle vector since mask and vector length  
>> don't match.
>> +  assert(NumElems != 1 && "Vector shuffle can not result in 1  
>> element");
>
> Why not allow this (and immediately simplify to something else)?
>
> If this only produces normalized vectors, why are changes needed  
> elsewhere
> (eg: LegalizeTypes)?  At this point I am quite confused!
>

The changes in LegalizesTypes are needed to take advantage of  
generalized the vector shuffle.   I should have copied my original  
email for the patch to this last patch as an introduction.   With  
these changes, we try to keep as much as the original vector program  
(after optimization) structure all the way to the backend.  The  
incoming program uses generalize vector shuffles, in SelectionDAG  
building, we normalize, and then in Legalization, we convert any of  
the illegal normalized shuffles to concat, extract sub vectors, etc.   
Before, legalization would break it down to insert/extracts which we  
had a hard time rebuilding.   For examples like transposing a vector,  
this allows us to generate much better code.  Sorry about the confusion!

>>   CCIfType<[v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],
>> -            CCAssignToReg<[XMM0,XMM1]>>,
>> +            CCAssignToReg<[XMM0,XMM1,XMM2,XMM3]>>,
>
> What's this about?
>

This shouldn't go in this patch either.  I needed it to run various  
tests on vector shuffles where we have code that pass 8 element  
vectors in registers.  This will go into another patch as we need it  
for some work here at apple.  XMM2 and XMM3 are non ABI compliant for  
X86 but since all front ends lower their code, this shouldn't cause  
problems

Thanks for your effort of going through the patch,
   -- Mon Ping