[llvm] [LoopVectorize] Enhance Vectorization decisions for predicate tail-folded loops with low trip counts (PR #69588)
Igor Kirillov via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 19 05:23:01 PDT 2023
https://github.com/igogo-x86 updated https://github.com/llvm/llvm-project/pull/69588
>From bf5d3ba1c6b46c0bdb87563170a90a7dc4f18364 Mon Sep 17 00:00:00 2001
From: Igor Kirillov <igor.kirillov at arm.com>
Date: Mon, 16 Oct 2023 12:56:55 +0000
Subject: [PATCH] [LoopVectorize] Enhance Vectorization decisions for predicate
tail-folded loops with low trip counts
* Avoid using `CM_ScalarEpilogueNotAllowedLowTripLoop` for loops known
to be predicate tail-folded, delegating to `areRuntimeChecksProfitable`
to decide on the profitability of vectorizing loops with runtime checks.
* Update the `areRuntimeChecksProfitable` function to consider the
`ScalarEpilogueLowering` setting when assessing vectorization of a loop.
With this patch, we can make more informed decisions for loops with low
trip counts, especially when leveraging Profile-Guided Optimization (PGO)
data.
---
.../Transforms/Vectorize/LoopVectorize.cpp | 24 +++++++++++++------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index aa435b0d47aa599..6d3011480f70519 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9746,7 +9746,8 @@ static void checkMixedPrecision(Loop *L, OptimizationRemarkEmitter *ORE) {
static bool areRuntimeChecksProfitable(GeneratedRTChecks &Checks,
VectorizationFactor &VF,
std::optional<unsigned> VScale, Loop *L,
- ScalarEvolution &SE) {
+ ScalarEvolution &SE,
+ ScalarEpilogueLowering SEL) {
InstructionCost CheckCost = Checks.getCost();
if (!CheckCost.isValid())
return false;
@@ -9816,11 +9817,13 @@ static bool areRuntimeChecksProfitable(GeneratedRTChecks &Checks,
// RtC < ScalarC * TC * (1 / X) ==> RtC * X / ScalarC < TC
double MinTC2 = RtC * 10 / ScalarC;
- // Now pick the larger minimum. If it is not a multiple of VF, choose the
- // next closest multiple of VF. This should partly compensate for ignoring
- // the epilogue cost.
+ // Now pick the larger minimum. If it is not a multiple of VF and a scalar
+ // epilogue is allowed, choose the next closest multiple of VF. This should
+ // partly compensate for ignoring the epilogue cost.
uint64_t MinTC = std::ceil(std::max(MinTC1, MinTC2));
- VF.MinProfitableTripCount = ElementCount::getFixed(alignTo(MinTC, IntVF));
+ if (SEL == CM_ScalarEpilogueAllowed)
+ MinTC = alignTo(MinTC, IntVF);
+ VF.MinProfitableTripCount = ElementCount::getFixed(MinTC);
LLVM_DEBUG(
dbgs() << "LV: Minimum required TC for runtime checks to be profitable:"
@@ -9940,7 +9943,14 @@ bool LoopVectorizePass::processLoop(Loop *L) {
else {
if (*ExpectedTC > TTI->getMinTripCountTailFoldingThreshold()) {
LLVM_DEBUG(dbgs() << "\n");
- SEL = CM_ScalarEpilogueNotAllowedLowTripLoop;
+ // Predicate tail-folded loops are efficient even when the loop
+ // iteration count is low. However, setting the epilogue policy to
+ // `CM_ScalarEpilogueNotAllowedLowTripLoop` prevents vectorizing loops
+ // with runtime checks. It's more effective to let
+ // `areRuntimeChecksProfitable` determine if vectorization is beneficial
+ // for the loop.
+ if (SEL != CM_ScalarEpilogueNotNeededUsePredicate)
+ SEL = CM_ScalarEpilogueNotAllowedLowTripLoop;
} else {
LLVM_DEBUG(dbgs() << " But the target considers the trip count too "
"small to consider vectorizing.\n");
@@ -10035,7 +10045,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
Hints.getForce() == LoopVectorizeHints::FK_Enabled;
if (!ForceVectorization &&
!areRuntimeChecksProfitable(Checks, VF, getVScaleForTuning(L, *TTI), L,
- *PSE.getSE())) {
+ *PSE.getSE(), SEL)) {
ORE->emit([&]() {
return OptimizationRemarkAnalysisAliasing(
DEBUG_TYPE, "CantReorderMemOps", L->getStartLoc(),
More information about the llvm-commits
mailing list