[llvm] [AArch64][LoopVectorize] Use either fixed-width or scalable VF when tail-folding (PR #67543)

Wed Sep 27 07:21:14 PDT 2023

================
@@ -5142,7 +5142,9 @@ ElementCount LoopVectorizationCostModel::getMaximizedVFForTarget(
     LLVM_DEBUG(dbgs() << "LV: Clamping the MaxVF to maximum power of two not "
                          "exceeding the constant trip count: "
                       << ClampedConstTripCount << "\n");
-    return ElementCount::getFixed(ClampedConstTripCount);
+    return ElementCount::get(
+        ClampedConstTripCount,
+        FoldTailByMasking ? MaxVectorElementCount.isScalable() : false);
----------------
Rin18 wrote:

My initial approach was to add 
`return ElementCount::get(ClampedConstTripCount, MaxVectorElementCount.isScalable());`
as well, but that breaks this test: clang/test/CodeGen/aarch64-sve-vector-bits-codegen.c, which you won't see when running check-llvm only. 

The issue with this test was due to the fact we specified the exact value of vscale to be 16. The trip count in this test is 64 so 16*64=1024 which led to the vector loop being optimised away.

So when we are not tail folding, we still want to perform at least one vector iteration, so we use a fixed-width VF even if MaxVectorElementCount.isScalable() is true.

https://github.com/llvm/llvm-project/pull/67543