[llvm] [LV] Improve code in selectInterleaveCount (NFC) (PR #128002)
Ramkumar Ramachandra via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 27 08:23:59 PST 2025
================
@@ -4998,51 +4997,53 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
}
unsigned EstimatedVF = getEstimatedRuntimeVF(VF, VScaleForTuning);
- unsigned KnownTC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
- if (KnownTC > 0) {
- // At least one iteration must be scalar when this constraint holds. So the
- // maximum available iterations for interleaving is one less.
- unsigned AvailableTC =
- requiresScalarEpilogue(VF.isVector()) ? KnownTC - 1 : KnownTC;
-
- // If trip count is known we select between two prospective ICs, where
- // 1) the aggressive IC is capped by the trip count divided by VF
- // 2) the conservative IC is capped by the trip count divided by (VF * 2)
- // The final IC is selected in a way that the epilogue loop trip count is
- // minimized while maximizing the IC itself, so that we either run the
- // vector loop at least once if it generates a small epilogue loop, or else
- // we run the vector loop at least twice.
-
- unsigned InterleaveCountUB = bit_floor(
- std::max(1u, std::min(AvailableTC / EstimatedVF, MaxInterleaveCount)));
- unsigned InterleaveCountLB = bit_floor(std::max(
- 1u, std::min(AvailableTC / (EstimatedVF * 2), MaxInterleaveCount)));
- MaxInterleaveCount = InterleaveCountLB;
-
- if (InterleaveCountUB != InterleaveCountLB) {
- unsigned TailTripCountUB =
- (AvailableTC % (EstimatedVF * InterleaveCountUB));
- unsigned TailTripCountLB =
- (AvailableTC % (EstimatedVF * InterleaveCountLB));
- // If both produce same scalar tail, maximize the IC to do the same work
- // in fewer vector loop iterations
- if (TailTripCountUB == TailTripCountLB)
- MaxInterleaveCount = InterleaveCountUB;
- }
- } else if (BestKnownTC && *BestKnownTC > 0) {
+
+ // Try to get the exact trip count, or an estimate based on profiling data or
+ // ConstantMax from PSE, failing that.
+ if (auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop)) {
// At least one iteration must be scalar when this constraint holds. So the
// maximum available iterations for interleaving is one less.
unsigned AvailableTC = requiresScalarEpilogue(VF.isVector())
? (*BestKnownTC) - 1
----------------
artagnon wrote:
Thanks a lot! I assumed there was some sort of sanitization of metadata, but I wasn't thinking straight: it does indeed wrap on some degenerate cases, and I've fixed this in https://github.com/llvm/llvm-project/pull/129080.
https://github.com/llvm/llvm-project/pull/128002
More information about the llvm-commits
mailing list