[llvm] [LoopPeel] Fix branch weights' effect on block frequencies (PR #128785)
Joel E. Denny via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 12 16:57:04 PDT 2025
https://github.com/jdenny-ornl updated https://github.com/llvm/llvm-project/pull/128785
>From 843b4cf5646f33a96a075a2a4b3230d00b70ca8f Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Wed, 12 Mar 2025 18:18:01 -0400
Subject: [PATCH] [LoopPeel] Fix branch weights' effect on block frequencies
For example:
```
declare void @f(i32)
define void @test(i32 %n) {
entry:
br label %do.body
do.body:
%i = phi i32 [ 0, %entry ], [ %inc, %do.body ]
%inc = add i32 %i, 1
call void @f(i32 %i)
%c = icmp sge i32 %inc, %n
br i1 %c, label %do.end, label %do.body, !prof !0
do.end:
ret void
}
!0 = !{!"branch_weights", i32 1, i32 9}
```
Given those branch weights, once any loop iteration is actually
reached, the probability of the loop exiting at the iteration's end is
1/(1+9). That is, the loop is likely to exit every 10 iterations and
thus has an estimated trip count of 10. `opt
-passes='print<block-freq>'` shows that 10 is indeed the frequency of
the loop body:
```
Printing analysis results of BFI for function 'test':
block-frequency-info: test
- entry: float = 1.0, int = 1801439852625920
- do.body: float = 10.0, int = 18014398509481984
- do.end: float = 1.0, int = 1801439852625920
```
Key Observation: The frequency of reaching any particular iteration is
less than for the previous iteration because the previous iteration
has a non-zero probability of exiting the loop. This observation
holds even though every loop iteration, once actually reached, has
exactly the same probability of exiting and thus exactly the same
branch weights.
Now we use `opt -unroll-force-peel-count=2 -passes=loop-unroll` to
peel 2 iterations and insert them before the remaining loop. We
expect the key observation above not to change, but it does under the
implementation without this patch. The block frequency becomes 1.0
for the first iteration, 0.9 for the second, and 6.4 for the main loop
body. Again, a decreasing frequency is expected, but it decreases too
much: the total frequency of the original loop body becomes 8.3. The
new branch weights reveal the problem:
```
!0 = !{!"branch_weights", i32 1, i32 9}
!1 = !{!"branch_weights", i32 1, i32 8}
!2 = !{!"branch_weights", i32 1, i32 7}
```
The exit probability is now 1/10 for the first peeled iteration, 1/9
for the second, and 1/8 for the remaining loop iterations. It seems
this behavior was trying to ensure a decreasing block frequency.
However, as in the key observation above for the original loop, that
happens correctly without decreasing the branch weights across
iterations.
This patch changes the peeling implementation not to decrease the
branch weights across loop iterations so that the frequency for every
iteration is the same as it was in the original loop. The total
frequency of the loop body, summed across all its occurrences, thus
remains 10 after peeling.
Unfortunately, that change means a later analysis cannot accurately
estimate the trip count of the remaining loop while examining the
remaining loop in isolation without considering the probability of
actually reaching it. For that purpose, this patch stores the new
trip count as separate metadata named `llvm.loop.estimated_trip_count`
and extends `llvm::getLoopEstimatedTripCount` to prefer it, if
present, over branch weights.
An alternative fix is for `llvm::getLoopEstimatedTripCount` to
subtract the `llvm.loop.peeled.count` metadata from the trip count
estimated by a loop's branch weights. However, there might be other
loop transformations that still corrupt block frequencies in a similar
manner and require a similar fix. `llvm.loop.estimated_trip_count` is
intended to provide a general way to store estimated trip counts when
branch weights cannot directly store them.
This patch introduces several fixme comments that need to be addressed
before it can land.
---
.../include/llvm/Transforms/Utils/LoopUtils.h | 25 ++-
llvm/lib/Transforms/Utils/LoopPeel.cpp | 145 +++++++-----------
llvm/lib/Transforms/Utils/LoopUtils.cpp | 20 ++-
.../LoopUnroll/peel-branch-weights-simple.ll | 66 ++++++++
.../LoopUnroll/peel-branch-weights.ll | 64 ++++----
.../LoopUnroll/peel-loop-pgo-deopt.ll | 11 +-
.../Transforms/LoopUnroll/peel-loop-pgo.ll | 13 +-
.../Transforms/LoopVectorize/X86/pr81872.ll | 18 ++-
8 files changed, 208 insertions(+), 154 deletions(-)
create mode 100644 llvm/test/Transforms/LoopUnroll/peel-branch-weights-simple.ll
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 8f4c0c88336ac..82d23a4b68ea1 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -315,7 +315,8 @@ TransformationMode hasLICMVersioningTransformation(const Loop *L);
void addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
unsigned V = 0);
-/// Returns a loop's estimated trip count based on branch weight metadata.
+/// Returns a loop's estimated trip count based on
+/// llvm.loop.estimated_trip_count metadata or, if none, branch weight metadata.
/// In addition if \p EstimatedLoopInvocationWeight is not null it is
/// initialized with weight of loop's latch leading to the exit.
/// Returns a valid positive trip count, saturated at UINT_MAX, or std::nullopt
@@ -324,13 +325,21 @@ std::optional<unsigned>
getLoopEstimatedTripCount(Loop *L,
unsigned *EstimatedLoopInvocationWeight = nullptr);
-/// Set a loop's branch weight metadata to reflect that loop has \p
-/// EstimatedTripCount iterations and \p EstimatedLoopInvocationWeight exits
-/// through latch. Returns true if metadata is successfully updated, false
-/// otherwise. Note that loop must have a latch block which controls loop exit
-/// in order to succeed.
-bool setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
- unsigned EstimatedLoopInvocationWeight);
+/// Set a loop's llvm.loop.estimated_trip_count metadata and, if \p
+/// EstimatedLoopInvocationWeight, branch weight metadata to reflect that loop
+/// has \p EstimatedTripCount iterations and \p EstimatedLoopInvocationWeight
+/// exit weight through latch. Returns true if metadata is successfully updated,
+/// false otherwise. Note that loop must have a latch block which controls loop
+/// exit in order to succeed.
+///
+/// The use case for not setting branch weight metadata is when the original
+/// branch weight metadata is correct for computing block frequencies but the
+/// trip count has changed due to a loop transformation. The branch weight
+/// metadata cannot be adjusted to reflect the new trip count, so we store the
+/// new trip count separately.
+bool setLoopEstimatedTripCount(
+ Loop *L, unsigned EstimatedTripCount,
+ std::optional<unsigned> EstimatedLoopInvocationWeight);
/// Check inner loop (L) backedge count is known to be invariant on all
/// iterations of its outer loop. If the loop has no parent, this is trivially
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index 0f3a92b65686c..a81d55611fad8 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -655,84 +655,6 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
}
}
-struct WeightInfo {
- // Weights for current iteration.
- SmallVector<uint32_t> Weights;
- // Weights to subtract after each iteration.
- const SmallVector<uint32_t> SubWeights;
-};
-
-/// Update the branch weights of an exiting block of a peeled-off loop
-/// iteration.
-/// Let F is a weight of the edge to continue (fallthrough) into the loop.
-/// Let E is a weight of the edge to an exit.
-/// F/(F+E) is a probability to go to loop and E/(F+E) is a probability to
-/// go to exit.
-/// Then, Estimated ExitCount = F / E.
-/// For I-th (counting from 0) peeled off iteration we set the weights for
-/// the peeled exit as (EC - I, 1). It gives us reasonable distribution,
-/// The probability to go to exit 1/(EC-I) increases. At the same time
-/// the estimated exit count in the remainder loop reduces by I.
-/// To avoid dealing with division rounding we can just multiple both part
-/// of weights to E and use weight as (F - I * E, E).
-static void updateBranchWeights(Instruction *Term, WeightInfo &Info) {
- setBranchWeights(*Term, Info.Weights, /*IsExpected=*/false);
- for (auto [Idx, SubWeight] : enumerate(Info.SubWeights))
- if (SubWeight != 0)
- // Don't set the probability of taking the edge from latch to loop header
- // to less than 1:1 ratio (meaning Weight should not be lower than
- // SubWeight), as this could significantly reduce the loop's hotness,
- // which would be incorrect in the case of underestimating the trip count.
- Info.Weights[Idx] =
- Info.Weights[Idx] > SubWeight
- ? std::max(Info.Weights[Idx] - SubWeight, SubWeight)
- : SubWeight;
-}
-
-/// Initialize the weights for all exiting blocks.
-static void initBranchWeights(DenseMap<Instruction *, WeightInfo> &WeightInfos,
- Loop *L) {
- SmallVector<BasicBlock *> ExitingBlocks;
- L->getExitingBlocks(ExitingBlocks);
- for (BasicBlock *ExitingBlock : ExitingBlocks) {
- Instruction *Term = ExitingBlock->getTerminator();
- SmallVector<uint32_t> Weights;
- if (!extractBranchWeights(*Term, Weights))
- continue;
-
- // See the comment on updateBranchWeights() for an explanation of what we
- // do here.
- uint32_t FallThroughWeights = 0;
- uint32_t ExitWeights = 0;
- for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
- if (L->contains(Succ))
- FallThroughWeights += Weight;
- else
- ExitWeights += Weight;
- }
-
- // Don't try to update weights for degenerate case.
- if (FallThroughWeights == 0)
- continue;
-
- SmallVector<uint32_t> SubWeights;
- for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
- if (!L->contains(Succ)) {
- // Exit weights stay the same.
- SubWeights.push_back(0);
- continue;
- }
-
- // Subtract exit weights on each iteration, distributed across all
- // fallthrough edges.
- double W = (double)Weight / (double)FallThroughWeights;
- SubWeights.push_back((uint32_t)(ExitWeights * W));
- }
-
- WeightInfos.insert({Term, {std::move(Weights), std::move(SubWeights)}});
- }
-}
-
/// Clones the body of the loop L, putting it between \p InsertTop and \p
/// InsertBot.
/// \param IterNumber The serial number of the iteration currently being
@@ -1006,11 +928,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
Instruction *LatchTerm =
cast<Instruction>(cast<BasicBlock>(Latch)->getTerminator());
- // If we have branch weight information, we'll want to update it for the
- // newly created branches.
- DenseMap<Instruction *, WeightInfo> Weights;
- initBranchWeights(Weights, L);
-
// Identify what noalias metadata is inside the loop: if it is inside the
// loop, the associated metadata must be cloned for each iteration.
SmallVector<MDNode *, 6> LoopLocalNoAliasDeclScopes;
@@ -1038,11 +955,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
assert(DT.verify(DominatorTree::VerificationLevel::Fast));
#endif
- for (auto &[Term, Info] : Weights) {
- auto *TermCopy = cast<Instruction>(VMap[Term]);
- updateBranchWeights(TermCopy, Info);
- }
-
// Remove Loop metadata from the latch branch instruction
// because it is not the Loop's latch branch anymore.
auto *LatchTermCopy = cast<Instruction>(VMap[LatchTerm]);
@@ -1068,15 +980,62 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
PHI->setIncomingValueForBlock(NewPreHeader, NewVal);
}
- for (const auto &[Term, Info] : Weights) {
- setBranchWeights(*Term, Info.Weights, /*IsExpected=*/false);
- }
-
// Update Metadata for count of peeled off iterations.
unsigned AlreadyPeeled = 0;
if (auto Peeled = getOptionalIntLoopAttribute(L, PeeledCountMetaData))
AlreadyPeeled = *Peeled;
- addStringMetadataToLoop(L, PeeledCountMetaData, AlreadyPeeled + PeelCount);
+ unsigned TotalPeeled = AlreadyPeeled + PeelCount;
+ addStringMetadataToLoop(L, PeeledCountMetaData, TotalPeeled);
+
+ // Update metadata for the estimated trip count. The original branch weight
+ // metadata is already correct for both the remaining loop and the peeled loop
+ // iterations, so don't adjust it.
+ //
+ // For example, consider what happens when peeling 2 iterations from a loop
+ // with an estimated trip count of 10 and inserting them before the remaining
+ // loop. Each of the peeled iterations and each iteration in the remaining
+ // loop still has the same probability of exiting the *entire original* loop
+ // as it did when in the original loop, and thus it should still have the same
+ // branch weights. The peeled iterations' non-zero probabilities of exiting
+ // already appropriately reduce the probability of reaching the remaining
+ // iterations just as they did in the original loop. Trying to also adjust
+ // the remaining loop's branch weights to reflect its new trip count of 8 will
+ // erroneously further reduce its block frequencies. However, in case an
+ // analysis later needs to determine the trip count of the remaining loop
+ // while examining it in isolation without considering the probability of
+ // actually reaching it, we store the new trip count as separate metadata.
+ //
+ // FIXME: getLoopEstimatedTripCount and setLoopEstimatedTripCount skip loops
+ // that don't match the restrictions of getExpectedExitLoopLatchBranch in
+ // LoopUtils.cpp. For example,
+ // llvm/tests/Transforms/LoopUnroll/peel-branch-weights.ll (introduced by
+ // b43a4d0850d5) has multiple exits. Should we try to extend them to handle
+ // such cases? For now, we just don't try to record
+ // llvm.loop.estimated_trip_count for such cases, so the original branch
+ // weights will have to do.
+ if (auto EstimatedTripCount = getLoopEstimatedTripCount(L)) {
+ // FIXME: The previous updateBranchWeights implementation had this
+ // comment:
+ //
+ // Don't set the probability of taking the edge from latch to loop header
+ // to less than 1:1 ratio (meaning Weight should not be lower than
+ // SubWeight), as this could significantly reduce the loop's hotness,
+ // which would be incorrect in the case of underestimating the trip count.
+ //
+ // See e8d5db206c2f commit log for further discussion. That seems to
+ // suggest that we should avoid ever setting a trip count of < 2 here
+ // (equal chance of continuing and exiting means the loop will likely
+ // continue once and then exit once). Or is keeping the original branch
+ // weights already a sufficient improvement for whatever analysis cares
+ // about this case?
+ unsigned EstimatedTripCountNew = *EstimatedTripCount;
+ if (EstimatedTripCountNew < TotalPeeled) // FIXME: TotalPeeled + 2?
+ EstimatedTripCountNew = 0; // FIXME: = 2?
+ else
+ EstimatedTripCountNew -= TotalPeeled;
+ setLoopEstimatedTripCount(L, EstimatedTripCountNew,
+ /*EstimatedLoopInvocationWeight=*/std::nullopt);
+ }
if (Loop *ParentLoop = L->getParentLoop())
L = ParentLoop;
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 84c08556f8a25..ae91dbd5bf902 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -53,6 +53,8 @@ using namespace llvm::PatternMatch;
static const char *LLVMLoopDisableNonforced = "llvm.loop.disable_nonforced";
static const char *LLVMLoopDisableLICM = "llvm.licm.disable";
+static const char *LLVMLoopEstimatedTripCount =
+ "llvm.loop.estimated_trip_count";
bool llvm::formDedicatedExitBlocks(Loop *L, DominatorTree *DT, LoopInfo *LI,
MemorySSAUpdater *MSSAU,
@@ -864,14 +866,22 @@ llvm::getLoopEstimatedTripCount(Loop *L,
getEstimatedTripCount(LatchBranch, L, ExitWeight)) {
if (EstimatedLoopInvocationWeight)
*EstimatedLoopInvocationWeight = ExitWeight;
+ // FIXME: Where else are branch weights directly used for estimating loop
+ // trip counts? They should also be updated to use
+ // LLVMLoopEstimatedTripCount when present... or to just call this
+ // function.
+ if (auto EstimatedTripCount =
+ getOptionalIntLoopAttribute(L, LLVMLoopEstimatedTripCount))
+ return EstimatedTripCount;
return *EstTripCount;
}
}
return std::nullopt;
}
-bool llvm::setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
- unsigned EstimatedloopInvocationWeight) {
+bool llvm::setLoopEstimatedTripCount(
+ Loop *L, unsigned EstimatedTripCount,
+ std::optional<unsigned> EstimatedloopInvocationWeight) {
// At the moment, we currently support changing the estimate trip count of
// the latch branch only. We could extend this API to manipulate estimated
// trip counts for any exit.
@@ -879,12 +889,16 @@ bool llvm::setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
if (!LatchBranch)
return false;
+ addStringMetadataToLoop(L, LLVMLoopEstimatedTripCount, EstimatedTripCount);
+ if (!EstimatedloopInvocationWeight)
+ return true;
+
// Calculate taken and exit weights.
unsigned LatchExitWeight = 0;
unsigned BackedgeTakenWeight = 0;
if (EstimatedTripCount > 0) {
- LatchExitWeight = EstimatedloopInvocationWeight;
+ LatchExitWeight = *EstimatedloopInvocationWeight;
BackedgeTakenWeight = (EstimatedTripCount - 1) * LatchExitWeight;
}
diff --git a/llvm/test/Transforms/LoopUnroll/peel-branch-weights-simple.ll b/llvm/test/Transforms/LoopUnroll/peel-branch-weights-simple.ll
new file mode 100644
index 0000000000000..418f5c417c457
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/peel-branch-weights-simple.ll
@@ -0,0 +1,66 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals
+; RUN: opt < %s -S -passes=loop-unroll -unroll-force-peel-count=2 2>&1 | FileCheck %s
+
+declare void @f(i32)
+
+; Test branch weights and estimated trip count metadata for simple loop after
+; peeling.
+define void @test(i32 %n) {
+; CHECK-LABEL: @test(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br label [[DO_BODY_PEEL_BEGIN:%.*]]
+; CHECK: do.body.peel.begin:
+; CHECK-NEXT: br label [[DO_BODY_PEEL:%.*]]
+; CHECK: do.body.peel:
+; CHECK-NEXT: [[INC_PEEL:%.*]] = add i32 0, 1
+; CHECK-NEXT: call void @f(i32 0)
+; CHECK-NEXT: [[C_PEEL:%.*]] = icmp sge i32 [[INC_PEEL]], [[N:%.*]]
+; CHECK-NEXT: br i1 [[C_PEEL]], label [[DO_END:%.*]], label [[DO_BODY_PEEL_NEXT:%.*]], !prof [[PROF0:![0-9]+]]
+; CHECK: do.body.peel.next:
+; CHECK-NEXT: br label [[DO_BODY_PEEL2:%.*]]
+; CHECK: do.body.peel2:
+; CHECK-NEXT: [[INC_PEEL3:%.*]] = add i32 [[INC_PEEL]], 1
+; CHECK-NEXT: call void @f(i32 [[INC_PEEL]])
+; CHECK-NEXT: [[C_PEEL4:%.*]] = icmp sge i32 [[INC_PEEL3]], [[N]]
+; CHECK-NEXT: br i1 [[C_PEEL4]], label [[DO_END]], label [[DO_BODY_PEEL_NEXT1:%.*]], !prof [[PROF0]]
+; CHECK: do.body.peel.next1:
+; CHECK-NEXT: br label [[DO_BODY_PEEL_NEXT5:%.*]]
+; CHECK: do.body.peel.next5:
+; CHECK-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]
+; CHECK: entry.peel.newph:
+; CHECK-NEXT: br label [[DO_BODY:%.*]]
+; CHECK: do.body:
+; CHECK-NEXT: [[I:%.*]] = phi i32 [ [[INC_PEEL3]], [[ENTRY_PEEL_NEWPH]] ], [ [[INC:%.*]], [[DO_BODY]] ]
+; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I]], 1
+; CHECK-NEXT: call void @f(i32 [[I]])
+; CHECK-NEXT: [[C:%.*]] = icmp sge i32 [[INC]], [[N]]
+; CHECK-NEXT: br i1 [[C]], label [[DO_END_LOOPEXIT:%.*]], label [[DO_BODY]], !prof [[PROF0]], !llvm.loop [[LOOP1:![0-9]+]]
+; CHECK: do.end.loopexit:
+; CHECK-NEXT: br label [[DO_END]]
+; CHECK: do.end:
+; CHECK-NEXT: ret void
+;
+
+entry:
+ br label %do.body
+
+do.body:
+ %i = phi i32 [ 0, %entry ], [ %inc, %do.body ]
+ %inc = add i32 %i, 1
+ call void @f(i32 %i)
+ %c = icmp sge i32 %inc, %n
+ br i1 %c, label %do.end, label %do.body, !prof !0
+
+do.end:
+ ret void
+}
+
+!0 = !{!"branch_weights", i32 1, i32 9}
+
+;.
+; CHECK: [[PROF0]] = !{!"branch_weights", i32 1, i32 9}
+; CHECK: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK: [[META2]] = !{!"llvm.loop.peeled.count", i32 2}
+; CHECK: [[META3]] = !{!"llvm.loop.estimated_trip_count", i32 8}
+; CHECK: [[META4]] = !{!"llvm.loop.unroll.disable"}
+;.
diff --git a/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll b/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll
index c58f8f1f4e4ee..63a0dd4b4b4f9 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll
@@ -15,9 +15,9 @@ define void @test() {
; CHECK: loop.peel:
; CHECK-NEXT: [[X_PEEL:%.*]] = call i32 @get.x()
; CHECK-NEXT: switch i32 [[X_PEEL]], label [[LOOP_LATCH_PEEL:%.*]] [
-; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL]]
-; CHECK-NEXT: i32 1, label [[LOOP_EXIT:%.*]]
-; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
+; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL]]
+; CHECK-NEXT: i32 1, label [[LOOP_EXIT:%.*]]
+; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
; CHECK-NEXT: ], !prof [[PROF0:![0-9]+]]
; CHECK: loop.latch.peel:
; CHECK-NEXT: br label [[LOOP_PEEL_NEXT:%.*]]
@@ -26,10 +26,10 @@ define void @test() {
; CHECK: loop.peel2:
; CHECK-NEXT: [[X_PEEL3:%.*]] = call i32 @get.x()
; CHECK-NEXT: switch i32 [[X_PEEL3]], label [[LOOP_LATCH_PEEL4:%.*]] [
-; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL4]]
-; CHECK-NEXT: i32 1, label [[LOOP_EXIT]]
-; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
-; CHECK-NEXT: ], !prof [[PROF1:![0-9]+]]
+; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL4]]
+; CHECK-NEXT: i32 1, label [[LOOP_EXIT]]
+; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
+; CHECK-NEXT: ], !prof [[PROF0]]
; CHECK: loop.latch.peel4:
; CHECK-NEXT: br label [[LOOP_PEEL_NEXT1:%.*]]
; CHECK: loop.peel.next1:
@@ -41,31 +41,33 @@ define void @test() {
; CHECK: loop:
; CHECK-NEXT: [[X:%.*]] = call i32 @get.x()
; CHECK-NEXT: switch i32 [[X]], label [[LOOP_LATCH:%.*]] [
-; CHECK-NEXT: i32 0, label [[LOOP_LATCH]]
-; CHECK-NEXT: i32 1, label [[LOOP_EXIT_LOOPEXIT:%.*]]
-; CHECK-NEXT: i32 2, label [[LOOP_EXIT_LOOPEXIT]]
-; CHECK-NEXT: ], !prof [[PROF2:![0-9]+]]
+; CHECK-NEXT: i32 0, label [[LOOP_LATCH]]
+; CHECK-NEXT: i32 1, label [[LOOP_EXIT_LOOPEXIT:%.*]]
+; CHECK-NEXT: i32 2, label [[LOOP_EXIT_LOOPEXIT]]
+; CHECK-NEXT: ], !prof [[PROF0]]
; CHECK: loop.latch:
-; CHECK-NEXT: br label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT: br label [[LOOP]], !llvm.loop [[LOOP1:![0-9]+]]
; CHECK: loop.exit.loopexit:
; CHECK-NEXT: br label [[LOOP_EXIT]]
; CHECK: loop.exit:
; CHECK-NEXT: ret void
+;
+; DISABLEADV-LABEL: @test(
+; DISABLEADV-NEXT: entry:
+; DISABLEADV-NEXT: br label [[LOOP:%.*]]
+; DISABLEADV: loop:
+; DISABLEADV-NEXT: [[X:%.*]] = call i32 @get.x()
+; DISABLEADV-NEXT: switch i32 [[X]], label [[LOOP_LATCH:%.*]] [
+; DISABLEADV-NEXT: i32 0, label [[LOOP_LATCH]]
+; DISABLEADV-NEXT: i32 1, label [[LOOP_EXIT:%.*]]
+; DISABLEADV-NEXT: i32 2, label [[LOOP_EXIT]]
+; DISABLEADV-NEXT: ], !prof [[PROF0:![0-9]+]]
+; DISABLEADV: loop.latch:
+; DISABLEADV-NEXT: br label [[LOOP]]
+; DISABLEADV: loop.exit:
+; DISABLEADV-NEXT: ret void
+;
-; DISABLEADV-LABEL: @test()
-; DISABLEADV-NEXT: entry:
-; DISABLEADV-NEXT: br label %loop
-; DISABLEADV: loop
-; DISABLEADV-NEXT: %x = call i32 @get.x()
-; DISABLEADV-NEXT: switch i32 %x, label %loop.latch [
-; DISABLEADV-NEXT: i32 0, label %loop.latch
-; DISABLEADV-NEXT: i32 1, label %loop.exit
-; DISABLEADV-NEXT: i32 2, label %loop.exit
-; DISABLEADV-NEXT: ], !prof !0
-; DISABLEADV: loop.latch:
-; DISABLEADV-NEXT: br label %loop
-; DISABLEADV: loop.exit:
-; DISABLEADV-NEXT: ret void
entry:
br label %loop
@@ -89,9 +91,9 @@ loop.exit:
;.
; CHECK: [[PROF0]] = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
-; CHECK: [[PROF1]] = !{!"branch_weights", i32 90, i32 180, i32 20, i32 10}
-; CHECK: [[PROF2]] = !{!"branch_weights", i32 80, i32 160, i32 20, i32 10}
-; CHECK: [[LOOP3]] = distinct !{!3, !4, !5}
-; CHECK: [[META4:![0-9]+]] = !{!"llvm.loop.peeled.count", i32 2}
-; CHECK: [[META5:![0-9]+]] = !{!"llvm.loop.unroll.disable"}
+; CHECK: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK: [[META2]] = !{!"llvm.loop.peeled.count", i32 2}
+; CHECK: [[META3]] = !{!"llvm.loop.unroll.disable"}
+;.
+; DISABLEADV: [[PROF0]] = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
;.
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
index d91cb5bab3827..e95121593e4f7 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
@@ -15,13 +15,13 @@
; CHECK: br i1 %{{.*}}, label %[[NEXT0:.*]], label %for.cond.for.end_crit_edge, !prof !16
; CHECK: [[NEXT0]]:
; CHECK: br i1 %c, label %{{.*}}, label %side_exit, !prof !15
-; CHECK: br i1 %{{.*}}, label %[[NEXT1:.*]], label %for.cond.for.end_crit_edge, !prof !17
+; CHECK: br i1 %{{.*}}, label %[[NEXT1:.*]], label %for.cond.for.end_crit_edge, !prof !16
; CHECK: [[NEXT1]]:
; CHECK: br i1 %c, label %{{.*}}, label %side_exit, !prof !15
-; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !18
+; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !16
; CHECK: [[NEXT2]]:
; CHECK: br i1 %c, label %{{.*}}, label %side_exit.loopexit, !prof !15
-; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !18
+; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !16, !llvm.loop !17
define i32 @basic(ptr %p, i32 %k, i1 %c) #0 !prof !15 {
entry:
@@ -84,6 +84,7 @@ attributes #1 = { nounwind optsize }
;CHECK: !15 = !{!"branch_weights", i32 1, i32 0}
; This is a weights of latch and its copies.
;CHECK: !16 = !{!"branch_weights", i32 3001, i32 1001}
-;CHECK: !17 = !{!"branch_weights", i32 2000, i32 1001}
-;CHECK: !18 = !{!"branch_weights", i32 1001, i32 1001}
+;CHECK: !17 = distinct !{!17, !18, !19, {{.*}}}
+;CHECK: !18 = !{!"llvm.loop.peeled.count", i32 4}
+;CHECK: !19 = !{!"llvm.loop.estimated_trip_count", i32 0}
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
index 15dce234baee9..dec126f289d32 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
@@ -5,7 +5,7 @@
; RUN: opt < %s -S -profile-summary-huge-working-set-size-threshold=9 -debug-only=loop-unroll -passes='require<profile-summary>,function(require<opt-remark-emit>,loop-unroll)' 2>&1 | FileCheck %s --check-prefix=NOPEEL
; REQUIRES: asserts
-; Make sure we use the profile information correctly to peel-off 3 iterations
+; Make sure we use the profile information correctly to peel-off 4 iterations
; from the loop, and update the branch weights for the peeled loop properly.
; CHECK: Loop Unroll: F[basic]
@@ -20,11 +20,11 @@
; CHECK-LABEL: @basic
; CHECK: br i1 %{{.*}}, label %[[NEXT0:.*]], label %for.cond.for.end_crit_edge, !prof !15
; CHECK: [[NEXT0]]:
-; CHECK: br i1 %{{.*}}, label %[[NEXT1:.*]], label %for.cond.for.end_crit_edge, !prof !16
+; CHECK: br i1 %{{.*}}, label %[[NEXT1:.*]], label %for.cond.for.end_crit_edge, !prof !15
; CHECK: [[NEXT1]]:
-; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !17
+; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !15
; CHECK: [[NEXT2]]:
-; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !17
+; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !15, !llvm.loop !16
define void @basic(ptr %p, i32 %k) #0 !prof !15 {
entry:
@@ -104,6 +104,7 @@ attributes #1 = { nounwind optsize }
!16 = !{!"branch_weights", i32 3001, i32 1001}
;CHECK: !15 = !{!"branch_weights", i32 3001, i32 1001}
-;CHECK: !16 = !{!"branch_weights", i32 2000, i32 1001}
-;CHECK: !17 = !{!"branch_weights", i32 1001, i32 1001}
+;CHECK: !16 = distinct !{!16, !17, !18, {{.*}}}
+;CHECK: !17 = !{!"llvm.loop.peeled.count", i32 4}
+;CHECK: !18 = !{!"llvm.loop.estimated_trip_count", i32 0}
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll b/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll
index a190e94a01489..da13cdc7ef070 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll
@@ -40,7 +40,7 @@ define void @test(ptr noundef align 8 dereferenceable_or_null(16) %arr) #0 {
; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 12
; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF1:![0-9]+]], !llvm.loop [[LOOP2:![0-9]+]]
; CHECK: middle.block:
-; CHECK-NEXT: br i1 true, label [[BB6:%.*]], label [[SCALAR_PH]], !prof [[PROF5:![0-9]+]]
+; CHECK-NEXT: br i1 true, label [[BB6:%.*]], label [[SCALAR_PH]], !prof [[PROF6:![0-9]+]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 87, [[MIDDLE_BLOCK]] ], [ 99, [[BB5:%.*]] ]
; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
@@ -48,7 +48,7 @@ define void @test(ptr noundef align 8 dereferenceable_or_null(16) %arr) #0 {
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
; CHECK-NEXT: [[AND:%.*]] = and i64 [[IV]], 1
; CHECK-NEXT: [[ICMP17:%.*]] = icmp eq i64 [[AND]], 0
-; CHECK-NEXT: br i1 [[ICMP17]], label [[BB18:%.*]], label [[LOOP_LATCH]], !prof [[PROF6:![0-9]+]]
+; CHECK-NEXT: br i1 [[ICMP17]], label [[BB18:%.*]], label [[LOOP_LATCH]], !prof [[PROF7:![0-9]+]]
; CHECK: bb18:
; CHECK-NEXT: [[OR:%.*]] = or disjoint i64 [[IV]], 1
; CHECK-NEXT: [[GETELEMENTPTR19:%.*]] = getelementptr inbounds i64, ptr [[ARR]], i64 [[OR]]
@@ -57,7 +57,7 @@ define void @test(ptr noundef align 8 dereferenceable_or_null(16) %arr) #0 {
; CHECK: loop.latch:
; CHECK-NEXT: [[IV_NEXT]] = add nsw i64 [[IV]], -1
; CHECK-NEXT: [[ICMP22:%.*]] = icmp eq i64 [[IV_NEXT]], 90
-; CHECK-NEXT: br i1 [[ICMP22]], label [[BB6]], label [[LOOP_HEADER]], !prof [[PROF7:![0-9]+]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK-NEXT: br i1 [[ICMP22]], label [[BB6]], label [[LOOP_HEADER]], !prof [[PROF8:![0-9]+]], !llvm.loop [[LOOP9:![0-9]+]]
; CHECK: bb6:
; CHECK-NEXT: ret void
;
@@ -98,11 +98,13 @@ attributes #0 = {"target-cpu"="haswell" "target-features"="+avx2" }
;.
; CHECK: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
; CHECK: [[PROF1]] = !{!"branch_weights", i32 1, i32 23}
-; CHECK: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]], [[META5:![0-9]+]]}
; CHECK: [[META3]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[PROF5]] = !{!"branch_weights", i32 1, i32 3}
-; CHECK: [[PROF6]] = !{!"branch_weights", i32 1, i32 1}
-; CHECK: [[PROF7]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META4]], [[META3]]}
+; CHECK: [[META5]] = !{!"llvm.loop.estimated_trip_count", i32 24}
+; CHECK: [[PROF6]] = !{!"branch_weights", i32 1, i32 3}
+; CHECK: [[PROF7]] = !{!"branch_weights", i32 1, i32 1}
+; CHECK: [[PROF8]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK: [[LOOP9]] = distinct !{[[LOOP9]], [[META10:![0-9]+]], [[META4]], [[META3]]}
+; CHECK: [[META10]] = !{!"llvm.loop.estimated_trip_count", i32 0}
;.
More information about the llvm-commits
mailing list