[PATCH] D152205: [Aarch64][SVE]SVE2] Enable tbl, tbl2 for shuffle lowering for fixed vector types.

Tue Aug 1 02:02:30 PDT 2023

dtemirbulatov added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25748-25749
+
+  if (BitsPerElt != 8 && BitsPerElt != 16 && BitsPerElt != 32 &&
+      BitsPerElt != 64)
+    return SDValue();
----------------
CarolineConcatto wrote:
> Why do we need to test this? What other value it can have?
> I thought the only possible values were this 8/16/32/64. Is it possible to have 128, 1024? Are these values we are avoiding?
yes, The element type chack was already done in useSVEForFixedLengthVectorVT() function. 

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25768
+
+  EVT MaskContainerVT = getContainerForFixedLengthVector(DAG, MaskType);
+  SDValue VecMask =
----------------
CarolineConcatto wrote:
> Are you here changing a fixed vector to a scalable vector to use tbl?
Right, as those instructions accept only scalable type operand and result.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25149

-  return SDValue();
+  if (!MaxSVESize || MinSVESize != MaxSVESize)
+    return SDValue();
----------------
sdesmalen wrote:
> This could do with a code comment describing why we don't do this when we don't know the exact SVE size.
> 
> I think there are several cases to distinguish here:
> * Elements are selected from either the first source vector, or from the second, but not both. This can be implemented with SVE and when the SVE size isn't known.
> * Elements are selected from both source vectors.
>     * If we only have SVE, then we can mimic this with multiple (single-source) TBL instructions and selects, but I suspect there's little value in handling this, given that SVE2 is the prevalent feature going forward.
>     * If we don't know the size, then we need to do more work to update the indices for the TBL. For example:
> ```<2 x i64> %x, <2 x i64> %y, <4 x i32> <0, 1, 2, 3>
> 
> if SVE VL = 256, then we'd basically insert the <2 x i64> vector value into a conceptual <4 x i64> register where only the first two lanes are relevant, so the offsets must be: <0, 1, 4, 5>
> if SVE VL = 512, then we'd basically insert the <2 x i64> vector value into a conceptual <8 x i64> register where only the first two lanes are relevant, so the offsets must be: <0, 1, 8, 9>
> 
> So we'd somehow need to do something like this:
> 
> <0, 1, 2, 3> + <0, 0, VL, VL>```
I will address run-time index construction in the following patch.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152205/new/

https://reviews.llvm.org/D152205