[llvm] [AArch64][LoopVectorize] Use either fixed-width or scalable VF when tail-folding (PR #67543)
Matthew Devereau via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 27 06:10:16 PDT 2023
================
@@ -5142,7 +5142,9 @@ ElementCount LoopVectorizationCostModel::getMaximizedVFForTarget(
LLVM_DEBUG(dbgs() << "LV: Clamping the MaxVF to maximum power of two not "
"exceeding the constant trip count: "
<< ClampedConstTripCount << "\n");
- return ElementCount::getFixed(ClampedConstTripCount);
+ return ElementCount::get(
+ ClampedConstTripCount,
+ FoldTailByMasking ? MaxVectorElementCount.isScalable() : false);
----------------
MDevereau wrote:
Is this inline if necessary? If `ConstTripCount` is not a power of 2 then it has already been clamped to one, and `FoldTailByMasking` must be false or we wouldn't have reached here because of the previous if condition.
I think you can just get away with
```c++
return ElementCount::get(
ClampedConstTripCount, MaxVectorElementCount.isScalable());
```
There seems to be a another test missing for clamping the trip count as well since ignoring the clamped value and doing
```c++
return ElementCount::get(
ConstTripCount, MaxVectorElementCount.isScalable());
```
also passes check-llvm, however I don't think this is because of your changes.
https://github.com/llvm/llvm-project/pull/67543
More information about the llvm-commits
mailing list