[llvm] [LV] Enable considering higher VFs when data extend ops are present i… (PR #137593)

Tue May 13 05:27:24 PDT 2025

================
@@ -362,10 +362,15 @@ AArch64TTIImpl::getInlineCallPenalty(const Function *F, const CallBase &Call,
 }
 
 bool AArch64TTIImpl::shouldMaximizeVectorBandwidth(
-    TargetTransformInfo::RegisterKind K) const {
+    TargetTransformInfo::RegisterKind K, const unsigned WidestType,
+    const unsigned SmallestType) const {
   assert(K != TargetTransformInfo::RGK_Scalar);
-  return (K == TargetTransformInfo::RGK_FixedWidthVector &&
-          ST->isNeonAvailable());
+  // For loops with extend operations e.g. zext, sext etc., limiting the max VF
+  // based on widest type inhibits considering higher VFs even though
+  // vectorizing with higher VF might be profitable. In such cases, we should
+  // limit the max VF based on smallest type and the decision whether a
+  // particular VF is beneficial or not be left to cost model.
+  return WidestType != SmallestType;
----------------
david-arm wrote:

I don't really understand why this is any different to simply returning `true` here, because `LoopVectorizationCostModel::getMaximizedVFForTarget` will only change the `MaxVF` if the types are different anyway. For example, `MaxVectorElementCount` will be identical to `MaxVectorElementCountMaxBW` when all types are the same. @fhahn any thoughts?

Also, do we need to still test whether SVE or NEON is available? For example, something like this

```
  return ST->isNeonAvailable() || ST->isSVEAvailable();
```

@huntergr-arm definitely found performance regressions when maximising the bandwidth for SVE. I'll see if I can find some examples.

https://github.com/llvm/llvm-project/pull/137593