[llvm] [AArch64] Update cost model for extracting halves from 128+ bit vectors (PR #155601)

Wed Aug 27 07:56:07 PDT 2025

=?utf-8?q?Gaëtan?= Bossu <gaetan.bossu at arm.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/155601 at github.com>


================
@@ -5750,11 +5750,13 @@ AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, VectorType *DstTy,
 
   Kind = improveShuffleKindFromMask(Kind, Mask, SrcTy, Index, SubTp);
   bool IsExtractSubvector = Kind == TTI::SK_ExtractSubvector;
-  // A subvector extract can be implemented with an ext (or trivial extract, if
-  // from lane 0). This currently only handles low or high extracts to prevent
-  // SLP vectorizer regressions.
+  // A subvector extract can be implemented with a NEON/SVE ext (or trivial
+  // extract, if from lane 0). This currently only handles low or high extracts
+  // to prevent SLP vectorizer regressions.
+  // Note that SVE's ext instruciton is destructive, but it can be fused with
+  // a movprfx to act like a constructive instruction.
   if (IsExtractSubvector && LT.second.isFixedLengthVector()) {
-    if (LT.second.is128BitVector() &&
+    if (LT.second.getFixedSizeInBits() >= AArch64::SVEBitsPerBlock &&
----------------
paulwalker-arm wrote:

Is `AArch64::SVEBitsPerBlock` the best option here?

For the NEON side of things to function correctly LT must be a 128-bit vector, so the code is just assuming `SVEBitsPerBlock` is 128, which of course it is but that's beside the point.  Looking at the comment the intent is to ignore cases where the result would be an illegal type? so perhaps `!LT.second.is64BitVector()` or just `LT.second.getFixedSizeInBits() >= 128`?

FYI: `AArch64::SVEBitsPerBlock` is a concept that has lost its value.  It is pretty much engrained in the implementation that SVE vectors are describe in multiples of NEON vectors.

https://github.com/llvm/llvm-project/pull/155601