[llvm] [LV] Enable considering higher VFs when data extend ops are present i… (PR #137593)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Tue May 13 05:27:24 PDT 2025
================
@@ -362,10 +362,15 @@ AArch64TTIImpl::getInlineCallPenalty(const Function *F, const CallBase &Call,
}
bool AArch64TTIImpl::shouldMaximizeVectorBandwidth(
- TargetTransformInfo::RegisterKind K) const {
+ TargetTransformInfo::RegisterKind K, const unsigned WidestType,
+ const unsigned SmallestType) const {
assert(K != TargetTransformInfo::RGK_Scalar);
- return (K == TargetTransformInfo::RGK_FixedWidthVector &&
- ST->isNeonAvailable());
+ // For loops with extend operations e.g. zext, sext etc., limiting the max VF
+ // based on widest type inhibits considering higher VFs even though
+ // vectorizing with higher VF might be profitable. In such cases, we should
+ // limit the max VF based on smallest type and the decision whether a
+ // particular VF is beneficial or not be left to cost model.
+ return WidestType != SmallestType;
----------------
david-arm wrote:
I don't really understand why this is any different to simply returning `true` here, because `LoopVectorizationCostModel::getMaximizedVFForTarget` will only change the `MaxVF` if the types are different anyway. For example, `MaxVectorElementCount` will be identical to `MaxVectorElementCountMaxBW` when all types are the same. @fhahn any thoughts?
Also, do we need to still test whether SVE or NEON is available? For example, something like this
```
return ST->isNeonAvailable() || ST->isSVEAvailable();
```
@huntergr-arm definitely found performance regressions when maximising the bandwidth for SVE. I'll see if I can find some examples.
https://github.com/llvm/llvm-project/pull/137593
More information about the llvm-commits
mailing list