[PATCH] D113376: [AArch64][SVE] Lower shuffles to permute instructions: zip1/2, uzp1/2, trn1/2

Wed Nov 10 01:00:51 PST 2021

wwei added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19267-19268
+  // like zip/uzp/trn/rev from having some wrong or undefined behavior.
+  if (Subtarget->getMinSVEVectorSizeInBits() ==
+      Subtarget->getMaxSVEVectorSizeInBits()) {
+    if (SDValue PermOp =
----------------
paulwalker-arm wrote:
> I still don't think this is enough as you've missed my previous `I doubt the logic holds for the case when a vector is legal but not the exact size of the target vector register` comment.  Further discussion is attached to the test zip_v32i8 below...
> 
> I'm starting to wonder if it's worth breaking out lowerShuffleToZIP_UZP_TRN as I feel like once support is added for more of the combinations there's not likely to be much reuse.  For example taking the zip1/zip2 case I think supporting zip1 is likely simple and just works for all legal VTs, but zip2 requires the VT to be exactly the size of the target vector (i.e. `VT.getSizeInBits() == Subtarget->getMinSVEVectorSizeInBits() == Subtarget->getMaxSVEVectorSizeInBits()`) or perhaps a different instruction sequence.
Thanks, I agree with you. Perhaps the current implementation of `lowerShuffleToZIP_UZP_TRN` reusing neon code is not so good, since SVE may have variable length.
Maybe we should identify and classify different shuffle patterns. For zip/uzp/trn/rev these four shuffle types, we can roughly divide them into two categories:
**zip1/uzp1/uzp2/trn1/trn2/revb/revh/revw** （it' simple，and works for all legal VTs）
**zip2/rev**  （complex，and need consider high half or whole register）
I think we can implement these two scenarios in steps.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113376/new/

https://reviews.llvm.org/D113376