[PATCH] D152205: [Aarch64][SVE]SVE2] Enable tbl, tbl2 for shuffle lowering for fixed vector types.

Mon Jul 24 08:57:42 PDT 2023

CarolineConcatto added a comment.

Hi Dinar,
I re-wrote your code in the Lowering to something I could understand better.
I will still spend some time trying to understand the mask logic and see if the new scalable vector is correct.
But I believe  it is good to share my comments for now

  +  // Avoid producing TBL and TBL2 instructions if we don't know
  +  // SVE register size or minimal is unequal to maximum size.
  +  if (!MaxSVESize || MinSVESize != MaxSVESize)
  +    return SDValue();

  -  return SDValue();
  +  // Element type is ???
  +  unsigned BitsPerElt = VT.getVectorElementType().getSizeInBits();
  +  if (BitsPerElt != 8 && BitsPerElt != 16 && BitsPerElt != 32 &&
  +      BitsPerElt != 64)
  +    return SDValue();
  +
  +  bool Swap = false;
  +  if (Op1.isUndef() || isZerosVector(Op1.getNode())) {
  +    // Swapped shuffle operands, but we have to recalculate indexes afterwards.
  +    std::swap(Op1, Op2);
  +    Swap = true;
  +  }
  +
  +
  +  bool IsUndefOrZero = Op2.isUndef() || isZerosVector(Op2.getNode());
  +  unsigned NumElts = VT.getVectorNumElements();
  +  unsigned IndexLen = MinSVESize / BitsPerElt;
  +  unsigned FillElements = IndexLen - NumElts;
  +
  +  // Set the Mask with the correct indexes
  +  SmallVector<SDValue, 8> TBLMask;
  +  for (int Val : ShuffleMask) {
  +    unsigned Offset = Val;
  +    if (Swap)
  +      Offset = Offset < NumElts ? Offset + NumElts : Offset - NumElts;
  +    // why we need to set the mask to 255 when one of the vectors is zero/undef?
  +    else if(Offset >= NumElts){
  +       if(IsUndefOrZero)
  +           Offset = 255;
  +       else if (IndexLen > NumElts)
  +           Offset += IndexLen - NumElts;
  +    }
  +    TBLMask.push_back(DAG.getConstant(Offset, DL, MVT::i64));
  +  }
  +  // Filling the rest of the mask with 255
  +  for (unsigned i = 0; i < FillElements; ++i)
  +    TBLMask.push_back(DAG.getConstant(255, DL, MVT::i64));
  +
  +  // Building a scalable vector for the suffle. So we can use the sve instructions
  +  EVT MaskEltType = EVT::getIntegerVT(*DAG.getContext(), BitsPerElt);
  +  EVT MaskType = EVT::getVectorVT(*DAG.getContext(), MaskEltType, IndexLen);
  +  EVT MaskContainerVT = getContainerForFixedLengthVector(DAG, MaskType);
  +  SDValue VecMask =
  +      DAG.getBuildVector(MaskType, DL, ArrayRef(TBLMask.data(), IndexLen));
  +  SDValue SVEMask = convertToScalableVector(DAG, MaskContainerVT, VecMask);
  +
  +  SDValue Shuffle;
  +  if (IsUndefOrZero || Swap) {
  +    Shuffle = convertFromScalableVector(
  +        DAG, VT,
  +        DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, ContainerVT,
  +                    DAG.getConstant(Intrinsic::aarch64_sve_tbl, DL, MVT::i32),
  +                    Op1, SVEMask));
  +  } else {
  +    if (Subtarget->hasSVE2())
  +      Shuffle = convertFromScalableVector(
  +          DAG, VT,
  +          DAG.getNode(
  +              ISD::INTRINSIC_WO_CHAIN, DL, ContainerVT,
  +              DAG.getConstant(Intrinsic::aarch64_sve_tbl2, DL, MVT::i32), Op1,
  +              Op2, SVEMask));
  +    else
  +      return SDValue();
  +  }
  +  return DAG.getNode(ISD::BITCAST, DL, Op.getValueType(), Shuffle);

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25748-25749
+
+  if (BitsPerElt != 8 && BitsPerElt != 16 && BitsPerElt != 32 &&
+      BitsPerElt != 64)
+    return SDValue();
----------------
Why do we need to test this? What other value it can have?
I thought the only possible values were this 8/16/32/64. Is it possible to have 128, 1024? Are these values we are avoiding?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25761
+      Offset = 255;
+    else if (Offset >= NumElts && FillElements)
+      Offset += IndexLen - NumElts;
----------------
For me this tests is a bit odd.
FillElements is unsigned and not a bool. So are you checking if IndexLen> NumElts?
Is it too difficult to put and example for each case with the fixed mask. So we can see how the mask was and how it has changed.
 if (Swap)
 for Offset < NumElts ->mask = <0,0,7,7> -> <3,3,0,0>
 for Offset > NumElts ->mask = <0,0,7,7>

Maybe this will help to understand if there is any hidden problem with these ifs.

I re-wrote that to something like:

```
  // Set the Mask with the correct indexes
  SmallVector<SDValue, 8> TBLMask;
  for (int Val : ShuffleMask) {
    unsigned Offset = Val;
    if (Swap)
      Offset = Offset < NumElts ? Offset + NumElts : Offset - NumElts;
    // why we need to set the mask to 255 when one of the vectors is zero/undef?
    else if(Offset >= NumElts){
       if(IsUndefOrZero)
           Offset = 255;
       else if (IndexLen > NumElts)
           Offset += IndexLen - NumElts;
    }
    TBLMask.push_back(DAG.getConstant(Offset, DL, MVT::i64));
  }
```

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25765
+  }
+  for (unsigned i = 0; i < FillElements; ++i)
+    TBLMask.push_back(DAG.getConstant(255, DL, MVT::i64));
----------------
I would suggest add a comment here.
You are populating the rest of the mask here with 255, why?
// Filling up the remaining positions of the mask with 255 because ...

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25768
+
+  EVT MaskContainerVT = getContainerForFixedLengthVector(DAG, MaskType);
+  SDValue VecMask =
----------------
Are you here changing a fixed vector to a scalable vector to use tbl?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152205/new/

https://reviews.llvm.org/D152205